[ 
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827792#comment-15827792
 ] 

Osma Suominen commented on JENA-1277:
-------------------------------------

[~samuraraujo-geophy] Now I don't understand. The only attachment here in this 
JIRA issue, spatial-assembler.ttl, was added by me. I haven't seen your 
assembler configuration. Can you please attach it?

You also mention Solr. I was assuming that you used Lucene directly, not via 
Solr. jena-spatial does support both, though. Which one are you using?

I agree that it is strange that sorting takes so long in Lucene. I think the 
sorting algorithm must be very inefficient. I saw a warning about this in some 
jena-spatial presentation slides (sorry, I don't have the link anymore). Maybe 
there is another way of doing it that is faster, or perhaps it has improved in 
newer Lucene versions. I don't have any prior experience with Lucene spatial 
queries.

By the way, is the sorting by distance important for your use case? You can see 
the GitHub pull request I made, which removes the sorting so that the 
jena-spatial property functions will start returning results in arbitrary 
order, which is much faster. Would this cause problems for your application?

The limit in the spatial query is used internally, i.e. given to Lucene (or 
Solr) and controls the maximum number of raw results returned by the property 
function (e.g. spatial:intersectBox). That is different from the SPARQL LIMIT 
clause, which controls how many results are returned by the whole SPARQL query. 
In the case of your query, there is nothing else in the SPARQL query than the 
property function, so all results for the property function are also returned 
by the query, but in the general case, the SPARQL query may contain joins, 
filters, unions etc. that either increase or decrease the number of result rows 
so it won't be the same as what the jena-spatial property function returned.

If you want e.g. 20000 results, you will need to increase the limit of the 
property function by setting 20000 as parameter, since it defaults to 10000. 
SPARQL itself doesn't have a default limit, so unless you explicitly set a 
limit there using LIMIT, all results from the property function will be 
returned also by the SPARQL query.

> Spatial Queries Very Slow For Large Databases
> ---------------------------------------------
>
>                 Key: JENA-1277
>                 URL: https://issues.apache.org/jira/browse/JENA-1277
>             Project: Apache Jena
>          Issue Type: Improvement
>          Components: Spatial
>    Affects Versions: Jena 3.1.1
>         Environment: Linux Ubuntu
>            Reporter: samur araujo
>            Assignee: Osma Suominen
>         Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to 
> execute. The query is below:
> PREFIX spatial: <http://jena.apache.org/spatial#>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> SELECT distinct ?place
> {
>     ?place spatial:intersectBox (32.55668 -117.12865 32.56668  -117.13865) .
>   
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed 
> that when I access the lucene index directly the queries are also very fast, 
> about 20ms. 
> The issue may be related to the pos-processing of lucene results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to