Hi Chris

Mattmann, Chris A (388J) wrote:
>> In my head, SIS (which I do not know very well) is a low-level geo indexing
>> library which could be used to provide the indexing capability for a 
>> GeoSPARQL
>> implementation.
> 
> Yeah that's what I was thinking too.

Ok.

I've played with Lucene spatial capabilities for this sort of things in the
past. My knowledge of Apache SIS is very limited. In particular, it is not
clear to me how/when things are persisted on disk. My impression is that SIS
load the entire index in RAM when it starts and serializes it out at the end.
Am I right? (I hope not. :-))

If that is the case, it could be an issue for large indexes.

>> I know that ARQ (i.e. the SPARQL query engine available in Jena) can
>> provide you with a SPARQL 1.1 engine and extension points to use other
>> custom indexes (such as SIS in this case).
>>
>> What exactly do you mean with "integrating with Any23"?
>> Do you mean crawling the web and extract lat/long from web pages?
> 
> Yep that's what I was thinking -- maybe doing it in Any23, and/or Tika.

Doing a web crawl to extract locations out of web pages using Any23|Tika seems
quite an useful thing for certain use cases.

In other scenarios people might already have a large dataset with locations in
it or people might want to leverage datasets such as Geonames, Freebase,
DBPedia, Yahoo GeoPlanet, etc. so crawling in these use cases is less important.

>> Where will you store those RDF statements?
> 
> It looks like Any23 would store to Sesame -- is that the case?

Probably.

It would be nice to have pluggable RDF stores in Any23, but this is another
story: https://issues.apache.org/jira/browse/ANY23-19 :-)

>> How can you implement the GeoSPARQL spec without (re)using a SPARQL
>> query engine (such as ARQ)?
> 
> I need that too :) I just don't understand it as well (and understand the 
> Any23/Tika
> and SIS part better). I'll have to learn Jena it looks like though, you game 
> to 
> help me out?

A very old prototype which shows you how you can extend ARQ is here:
https://github.com/castagna/GeoARQ

It is just a prototype and it is using ARQ's property functions rather than
filter functions (and it is using Lucene spatial rather than SIS). But, it
is IMHO a good starting point to see how you could have ARQ using a custom
index to perform spatial searches.

The reason why at the time I used Lucene spatial is because that was the only
alternative (non (L)GPL) I found (I did not know about SIS at the time).

The reason why I did not implemented GeoSPARQL is simplicity, I wanted just a
proof of concept and the most important use case IMHO is searching things around
a point and returning results sorted by distance.

For GeoSPARQL (which I need to go back an read properly) do we need custom
FILTER functions or property functions (or both)?

>> IMHO geo location (as well as free text) are two SPARQL extensions which
>> are very useful in loads of use cases.
> 
> Yep I'm super excited to get this implemented. You interested in helping? I 
> think
> we can bring together Tika, Any23, Jena and SIS here...

I am interested in learning more about SIS, I have no idea at the moment on how
much effort is necessary to implement GeoSPARQL and if that spec is going to be
implemented elsewhere by other RDF stores.

At the moment, I cannot put much effort on this. But, if something similar a la
LARQ and/or GeoARQ is useful and I can help, I'll do it.

I see two main use cases here:

 1. Crawling the web and build a dataset of statements with locations.
 2. Indexing a dataset with statements with locations and extend SPARQL to
    perform queries over it.

For 1. you need Any23|Tika (and a crawler) and, eventually, an RDF store.
For 2. you need SIS and a SPARQL query engine (which of course uses an RDF 
store).

If I were trying to implement GeoSPARQL, I would start with 2., SIS and ARQ.

My 2 cents,
Paolo

> 
> Cheers,
> Chris
> 
>>> I'm CC'ing the any23-dev and jena-dev user lists (apologies for the SPAM 
>>> guys)
>>> just to keep them in the loop.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Jan 31, 2012, at 4:59 AM, Andy Seaborne wrote:
>>>
>>>> Hi there,
>>>>
>>>> I'm investigating what it would take to implement GeoSPARQL.
>>>> There is already an Apache-licensed SPARQL engine in podling Jena.
>>>>
>>>> Of the things needed are a persistent storage layer with the right 
>>>> license.  Maybe the SIS project has something to use.
>>>>
>>>> If I understand it correctly, the qtree implementation is an in-memory 
>>>> structure, with the ability to read from a serialized form on disk, and 
>>>> to be able to write it to disk in that form.
>>>>
>>>> Is there any information on scaling for the qtree?  Memory usage?
>>>>
>>>> California_Restaurants.csv is 54K points - is that typical usage size?
>>>>
>>>> (yes ... there are other things needed as well such as conversion code 
>>>> between coodinate systems, format parsers, polygon code, ... but a start 
>>>> would be just for point data in one system :)
>>>>
>>>> An open copy of the spec is available at:
>>>>
>>>> http://www.w3.org/2011/02/GeoSPARQL.pdf
>>>>
>>>>    Andy
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattm...@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> Phone: +1 (818) 354-8810
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 

Reply via email to