[
https://issues.apache.org/jira/browse/JENA-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15827792#comment-15827792
]
Osma Suominen commented on JENA-1277:
-------------------------------------
[~samuraraujo-geophy] Now I don't understand. The only attachment here in this
JIRA issue, spatial-assembler.ttl, was added by me. I haven't seen your
assembler configuration. Can you please attach it?
You also mention Solr. I was assuming that you used Lucene directly, not via
Solr. jena-spatial does support both, though. Which one are you using?
I agree that it is strange that sorting takes so long in Lucene. I think the
sorting algorithm must be very inefficient. I saw a warning about this in some
jena-spatial presentation slides (sorry, I don't have the link anymore). Maybe
there is another way of doing it that is faster, or perhaps it has improved in
newer Lucene versions. I don't have any prior experience with Lucene spatial
queries.
By the way, is the sorting by distance important for your use case? You can see
the GitHub pull request I made, which removes the sorting so that the
jena-spatial property functions will start returning results in arbitrary
order, which is much faster. Would this cause problems for your application?
The limit in the spatial query is used internally, i.e. given to Lucene (or
Solr) and controls the maximum number of raw results returned by the property
function (e.g. spatial:intersectBox). That is different from the SPARQL LIMIT
clause, which controls how many results are returned by the whole SPARQL query.
In the case of your query, there is nothing else in the SPARQL query than the
property function, so all results for the property function are also returned
by the query, but in the general case, the SPARQL query may contain joins,
filters, unions etc. that either increase or decrease the number of result rows
so it won't be the same as what the jena-spatial property function returned.
If you want e.g. 20000 results, you will need to increase the limit of the
property function by setting 20000 as parameter, since it defaults to 10000.
SPARQL itself doesn't have a default limit, so unless you explicitly set a
limit there using LIMIT, all results from the property function will be
returned also by the SPARQL query.
> Spatial Queries Very Slow For Large Databases
> ---------------------------------------------
>
> Key: JENA-1277
> URL: https://issues.apache.org/jira/browse/JENA-1277
> Project: Apache Jena
> Issue Type: Improvement
> Components: Spatial
> Affects Versions: Jena 3.1.1
> Environment: Linux Ubuntu
> Reporter: samur araujo
> Assignee: Osma Suominen
> Attachments: spatial-assembler.ttl
>
>
> I loaded geonames on Jena but the spatial queries take more than 3s to
> execute. The query is below:
> PREFIX spatial: <http://jena.apache.org/spatial#>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> SELECT distinct ?place
> {
> ?place spatial:intersectBox (32.55668 -117.12865 32.56668 -117.13865) .
>
> }
> The data can be downloaded here:
> https://drive.google.com/file/d/0B-fwYPJYT1GOYVVIZF9ROUxzclk/view?usp=sharing
> For small datasets the queries are executed in 200ms, very fast. I noticed
> that when I access the lucene index directly the queries are also very fast,
> about 20ms.
> The issue may be related to the pos-processing of lucene results.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)