Kelly,

The answer is to use an RDF query language (SPARQL is the obvious  
choice), and a tool that automatically translates the RDF queries to  
SQL.

There are two kinds of these tools:

First, those that present an RDF view or RDF dumps of an existing  
database schema (like the first table from your example). The D2R  
Server/D2RQ suite mentioned by Thomas is a popular choice (better  
links: [1], [2]). SquirrelRDF [3] is another, and OpenLink Virtuoso  
seems to have some support for this too [4]. Eric Prud'hommeaux did  
some work on extending MySQL with SPARQL capabilities [5].

Second, there are tools that use a schema optimized for storing RDF,  
often similar to the resource-predicate-value table in your second  
example. Many RDF libraries and frameworks such as Jena [6], Sesame  
[7] and RAP [8] come with such a "triple store". 3store [9] is  
another example optimized for MySQL. Oracle has built-in support for  
this (though they don't do SPARQL but invented their own query  
language) [10].

There's been a lot of work on these issues in the RDF and database  
communities. A random link dump can be found at [11].

Yours,
Richard

[1] http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/
[2] http://sites.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/
[3] http://jena.sourceforge.net/SquirrelRDF/
[4] http://virtuoso.openlinksw.com/wiki/main/Main/VOSSQLRDF
[5] http://xtech06.usefulinc.com/schedule/paper/156
[6] http://jena.sourceforge.net/
[7] http://www.openrdf.org/
[8] http://sites.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/
[9] http://www.aktors.org/technologies/3store/
[10] http://www.oracle.com/technology/tech/semantic_technologies/
[11] http://del.icio.us/tag/sparql2sql


On 1 Feb 2007, at 03:13, Kelly Jones wrote:

> I have some data in a regular MySQL table called usplaces:
>
> city         |state|country|latitude |longitude  |population|comments
> -------------+-----+-------+---------+-----------+----------+--------
> New York     |NY   |USA    |40.704234| -73.917927|8008278   |Big Apple
> Chicago      |IL   |USA    |41.840675|-87.679365 |2896016   |Windy  
> City
> San Francisco|CA   |USA    |37.759881|-122.437392|776733    |City  
> by the Bay
> [pretend there's lots more rows here]
>
> I can now make queries like this:
>
> "All information about cities between 35 and 40 degrees latitude, -90
> and -70 longitude with population over 10K":
>
> SELECT * FROM usplaces WHERE (latitude BETWEEN 35 AND 40) AND
> (longitude between -90 AND -70) AND (population>10000);
>
> "All states that have at least one city larger than 1M":
>
> SELECT DISTINCT state FROM usplaces WHERE population>1000000;
>
> I now convert the data to RDF format (a different MySQL table called
> usplaces_rdf) as follows (I'm cheating slightly: "Chicago" in the
> first column means "the URI representing the city of Chicago", and
> "IL" in the third column means "the URI representing the state of
> Illinois"):
>
> key    |relation  |value
> -------+----------+-----
> Chicago|state     |IL
> Chicago|country   |USA
> Chicago|geo:lat   |41.840675
> Chicago|geo:lon   |-87.679365
> Chicago|population|2896016
> Chicago|comments  |Windy City
> [lots more rows here, 6 rows for each row in the original table]
>
> [There are many other much more efficient ways to convert to RDF, this
> is just an example]
>
> I can now run the first query above ("All information about cities
> between 35 and 40 degrees latitude, -90 and -70 longitude with
> population over 10K") as something like:
>
> SELECT us1.* FROM usplaces_rdf us1
>  LEFT JOIN usplaces_rdf us2 ON (us1.key=us2.key AND  
> relation='geo:lat')
>  LEFT JOIN usplaces_rdf us3 ON (us2.key=us3.key AND  
> relation='geo:lon')
>  LEFT JOIN usplaces_rdf us4 ON (us3.key=us4.key AND  
> relation='population')
> WHERE
>  (us2.value BETWEEN 35 AND 40) AND
>  (us3.value BETWEEN -90 AND -70) AND
>  (us4.value > 10000);
>
> [untested, but should be fairly close].
>
> I'm sure I could get my 2nd result ("All states that have at least one
> city larger than 1M") with an even more complex query.
>
> RDF seems to have many advantages (eg, multiple values for a given
> field and the ability to store "objects" in a column, not just
> values), but the queries seem long and tedious. They're probably
> efficient, but hard are hard to write.
>
> Has anyone written a simpler query language for RDF data stored in an
> SQL table? Are there better ways of storing/searching RDF data?
>
> [I sense there's a deep connection between SQL and RDF, but haven't
> figured it out yet]
>
> -- 
> We're just a Bunch Of Regular Guys, a collective group that's trying
> to understand and assimilate technology. We feel that resistance to
> new ideas and technology is unwise and ultimately futile.
> _______________________________________________
> General mailing list
> [email protected]
> http://simile.mit.edu/mailman/listinfo/general
>

_______________________________________________
General mailing list
[email protected]
http://simile.mit.edu/mailman/listinfo/general

Reply via email to