[
https://issues.apache.org/jira/browse/LUCENE-6712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Nicholas Knize updated LUCENE-6712:
-----------------------------------
Attachment: LUCENE-6712.patch
Awesome! Thanks for the review Mike! Updated patch to address comments is
attached.
bq. this is mixing up separate changes I think? One change is cutover to doc
values for the point filtering of each lat/lon, and the other is changing the
lower detail level and higher prec step?
Indeed. The former gives search performance improvements, the latter gives
indexing performance improvements. I can split these into 2 patches if desired?
That way we can separately investigate the impact of changing the precision
value?
bq. Shouldn't you iterate through all values and accept the docs if any of them
were in-bounds? Can you add a test case that exposes this?
++ Thanks for pointing that out! I had intended to change that. Fixed in the
attached patch - I also added explicit multi-valued documents and testing to
cover this. Random multi-valued documents would be nice, though I don't think
it blocks the patch?
bq. Couldn't GeoPointTermsEnum just have an abstract acceptLatLon method?
++ I had gone back and forth about this a couple times. With DV post filtering
it makes more sense to now have GeoPointTermsEnum be abstract with an abstract
postFilter method. Before, most of the logic was shared, only crosses and
within were fully overridden in Poly and Distance query classes. I went ahead
and made the change in the attached patch.
bq. It looks like you continue using full precision terms to approximate the
shape's boundary right?
No, the Range instances are now using lower precision terms for the boundaries
(up to PRECISION_STEP * MAX_SHIFT - which works out to no higher than level
18). GPTQConstantScoreWrapper iterates the docIds in the postings list. So
full precision terms (32 > level >18) are never used (really just wasting space
in the index). I suppose I could modify GeoPointField to only index up to a
shift of PRECISION_STEP * MAX_SHIFT and further reduce the index size?
> GeoPointField should cut over to DocValues for boundary filtering
> -----------------------------------------------------------------
>
> Key: LUCENE-6712
> URL: https://issues.apache.org/jira/browse/LUCENE-6712
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Nicholas Knize
> Attachments: LUCENE-6712.patch, LUCENE-6712.patch, LUCENE-6712.patch
>
>
> Currently GeoPointField queries only use the Terms Dictionary for ranges that
> fall within and on the boundary of the query shape. For boundary ranges the
> full precision terms are iterated, for within ranges the postings list is
> used.
> Instead of iterating full precision terms for boundary ranges, this
> enhancement cuts over to DocValues for post-filtering boundary terms. This
> allows us to increase precisionStep for GeoPointField thereby reducing the
> number of terms and the size of the index. This enhancement should also
> provide a boost in query performance since visiting more docs and fewer terms
> should be more efficient than visiting fewer docs and more terms.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]