[ 
https://issues.apache.org/jira/browse/LUCENE-5408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-5408:
---------------------------------

    Attachment: LUCENE-5408_GeometryStrategy.patch

This is intermediate progress; it needs to be tested.  And I hope to possible 
share a Bits based DocIdSet with [~mikemccand] in LUCENE-5418.  The sentiment 
in that issue about how to handle super-slow Filters is a problem here too.

I had an epiphany last night that the current Spatial RPT grid and algorithm 
doesn't need to be modified to be able to differentiate the matching docs into 
confirmed & un-confirmed matches for common scenarios.  As such, to prevent 
mis-use of the expensive Filter returned from this GeometryStrategy, I might 
force it to be paired with RecursivePrefixTreeStrategy.  And then leave an 
expert method exposed to grab Bits or a Filter purely based on the Geometry 
DocValues check.  ElasticSearch and Solr wouldn't use that but someone coding 
directly to Lucene would have the ability to wire things together in ways more 
flexible than are possible in ES or Solr.  The most ideal way is to compute a 
fast pre-filter bitset separate from the slow post-filter, with user keyword 
queries and other filters in the middle.  But the slow post-filter to operate 
best needs a side-artifact bitset computed when the pre-filter bitset is 
generated.  I'll eventually be more clear in javadocs.

> GeometryStrategy -- match geometries in DocValues
> -------------------------------------------------
>
>                 Key: LUCENE-5408
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5408
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/spatial
>            Reporter: David Smiley
>            Assignee: David Smiley
>             Fix For: 4.7
>
>         Attachments: LUCENE-5408_GeometryStrategy.patch
>
>
> I've started work on a new SpatialStrategy implementation I'm tentatively 
> calling GeometryStrategy.  It's similar to the [JtsGeoStrategy in 
> Spatial-Solr-Sandbox|https://github.com/ryantxu/spatial-solr-sandbox/tree/master/LSE/src/main/java/org/apache/lucene/spatial/pending/jts]
>  but a little different in the details -- certainly faster.  Using Spatial4j 
> 0.4's BinaryCodec, it'll serialize the shape to bytes (for polygons this in 
> internally WKB format) and the strategy will put it in a 
> BinaryDocValuesField.  In practice the shape is likely a polygon but it 
> needn't be.  Then I'll implement a Filter that returns a DocIdSetIterator 
> that evaluates a given document passed via advance(docid)) to see if the 
> query shape matches a shape in DocValues. It's improper usage for it to be 
> used in a situation where it will evaluate every document id via nextDoc().  
> And in practice the DocValues format chosen should be a disk resident one 
> since each value tends to be kind of big.
> This spatial strategy in and of itself has no _index_; it's O(N) where N is 
> the number of documents that get passed thru it.  So it should be placed last 
> in the query/filter tree so that the other queries limit the documents it 
> needs to see.  At a minimum, another query/filter to use in conjunction is 
> another SpatialStrategy like RecursivePrefixTreeStrategy.
> Eventually once the PrefixTree grid encoding has a little bit more metadata, 
> it will be possible to further combine the grid & this strategy in such a way 
> that many documents won't need to be checked against the serialized geometry.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to