[
https://issues.apache.org/jira/browse/LUCENE-7401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401949#comment-15401949
]
Michael McCandless commented on LUCENE-7401:
--------------------------------------------
bq. what happens eg. if you want to index all towns in the world alongside
their population as a 3rd dimension. Given that there are very large areas that
only have small towns, it could happen that the population dimension does not
get indexed at all in these areas?
That's a good example! In that case, with our current splitting, running a
range filter for "small population" will be costly. Though, without other
filters (by lat/lon) it will likely be costly anyway since town population is
probably Zipf's law like? I.e., most areas will still have many more small
population towns than big ones.
bq. Hmm this got me curious, why is it an adversarial case if all points are
equidistant from an origin?
Oh it results in long slivery KD cells, which means queries have to visit too
many points.
> BKDWriter should ensure all dimensions are indexed
> --------------------------------------------------
>
> Key: LUCENE-7401
> URL: https://issues.apache.org/jira/browse/LUCENE-7401
> Project: Lucene - Core
> Issue Type: Bug
> Reporter: Adrien Grand
> Priority: Minor
>
> The current heuristic is to use the dimension that has the largest span, so
> if dimensions have a different distribution of values, we could theoretically
> (but maybe in practice too?) end up with one dimension that is not indexed at
> all and queries that are mostly selective on this dimension would need to
> scan lots of blocks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]