Osma Suominen commented on JENA-1277:

I tried reproducing this. It takes a while though since the data set is rather 
large. I will attach the assembler/fuseki configuration file that I used.

First I created the TDB using tdbloader2:

tdbloader2 --loc tdb geonames.nt.gz

This took 69 minutes on my i3-2330M Ubuntu 16.04 laptop with SSD.

Then I created the spatial index. I had to experiment a bit until I found out 
the amount of memory. Luckily 6G was enough, since I'm on a 8G machine so I 
couldn't have afforded much more:

java -Xmx6G -cp fuseki-server.jar jena.spatialindexer 

This took 19 minutes.

Finally I ran Fuseki 1.4.1. I tweaked fuseki-server startup script beforehand 
to give it 4G of memory, just in case.

./fuseki-server --config spatial-assembler.ttl

Finally I executed the query:
s-query --service=http://localhost:3030/ds/sparql --query query.rq --output=csv 

I ran this a few times and the response time varied between 18 and 27 seconds. 
I got 2469 results, not 17 as you said on the mailing list. I suspect that your 
spatial index is somehow incomplete, since you got fewer results in a shorter 

In any case, I can confirm that this spatial query is really slow.

> Spatial Queries Very Slow For Large Databases
> ---------------------------------------------
>                 Key: JENA-1277
>                 URL: https://issues.apache.org/jira/browse/JENA-1277
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Spatial
>    Affects Versions: Jena 3.1.1
>         Environment: Linux Ubuntu
>            Reporter: samur araujo
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: <http://jena.apache.org/spatial#>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> SELECT distinct ?place
> {
>     ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.

This message was sent by Atlassian JIRA

Reply via email to