*"How do we label, tag, and/or compile this data into a usable format?"*
*
*
If you are looking for a long term usability of data and yet *cost-effective
* data storage, I recommend RDF data. You can try to identify appropriate 
ontology (.owl) and use them to tag/label the fields. Having a look onto 
the excel sheets provided by Ragini, it seems that the data is much sorted 
and has the potential of getting distinctly classified using ontology. RDF 
format prevents you from making expenses on proprietary relational 
databases (just in case if you dont need one!)

"*We'd like to have a robust but simple-to-use system that can handle 
complex queries in a way that is easy to understand for laypeople.*"
On top of RDF data you can create a SPARQL query engine,which will perform 
queries for specified data fields. A routine web based front-end with a 
backend SPARQL process can help with the process. There are support 
libraries in PHP, Java etc. 


On Monday, 14 January 2013 07:19:13 UTC, Nisha wrote:
>
> Hi everyone,
>
> At India Water Portal, my colleagues Ragini, Bala and I are working on a 
> data project with Keystone Foundation <http://keystone-foundation.org/> in 
> the Nilgiris, Tamil Nadu. (Where Bala works) Keystone's programs are 
> focused on the intersection of livelihoods, enterprise, and environmental 
> conservation, and it works mostly with the indigenous tribal population in 
> the area. 
>
> THE DATA
>
> There are several different kinds of data. The major categories are 
> weather (humidity, temperature, rainfall, wet days), water quality, 
> sediment rate, land use, and community water supply/systems. Each of these 
> has a location (a village, a station, or a region), and most also have a 
> date range. The data that is easily correlated (i.e. data from the same 
> time period and the same location) is often in different Excel sheets. 
>
> We have tried aggregate this data by location by individually going 
> through each sheet and figuring out a) the data type, b) the location, and 
> c) the date range. I've also aggregated some of the water quality data 
> (from the DFiD study and the every-other-month data, May 2005-Dec 2008, 
> from the Sigur water project) into one Excel sheet.
>
> Almost all the data is in Excel sheets, but there may be some additional 
> data buried in Word documents. Data is missing in some places and the 
> parameters change over time, especially during longer date ranges.
>
> THE PROBLEM
>
> How do we label, tag, and/or compile this data into a usable format? Right 
> now, the data is in different formats, with different headings, 
> misspellings, and inconsistent formatting. Google Refine can help with some 
> of these tasks, but the general problem is larger: there are many variables 
> and parameters, so what is the best way to organize all of this 
> information? We'd like to have a robust but simple-to-use system that can 
> handle complex queries in a way that is easy to understand for laypeople. 
> We're unclear on what exactly can be done with this data, and organizing it 
> properly will go a long way towards helping us conceptualize possibilities.
>
>
> Any tools that you guys know about that could help us out with this?
>
> Nisha
> -- 
> Nisha Thompson
> Mobile: 962-061-2245
>
>  

-- 
For more details about this list
http://datameet.org/discussions/
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group, send email to 
datameet+unsubscr...@googlegroups.com.


Reply via email to