[
https://issues.apache.org/jira/browse/LUCENE-6360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14551953#comment-14551953
]
Adrien Grand commented on LUCENE-6360:
--------------------------------------
Sorry Paul, I had missed your comment.
bq. I wonder whether a compressing DocIdSet could also help here.
This is something we are already doing: we use BitDocIdSetBuilder which
internally starts with a sparse bit set and upgrades to a dense FixedBitSet if
the cardinality becomes high. I agree this is already a win. But having actual
skipping support would be even better? Ie. if you intersect with a sparse
filter, you would not even need to iterate over documents that don't match the
filter.
bq. To figure out the threshold(s), real life test cases would be helpful. Do
you have some in mind already?
My current idea is to have the same threshold here and when running
MultiTermQueries, LUCENE-6458
> TermsQuery should rewrite to a ConstantScoreQuery over a BooleanQuery when
> there are few terms
> ----------------------------------------------------------------------------------------------
>
> Key: LUCENE-6360
> URL: https://issues.apache.org/jira/browse/LUCENE-6360
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
>
> TermsQuery helps when there are lot of terms from which you would like to
> compute the union, but it is a bit harmful when you have few terms since it
> cannot really skip: it always consumes all documents matching the underlying
> terms.
> It would certainly help to rewrite this query to a ConstantScoreQuery over a
> BooleanQuery when there are few terms in order to have actual skip support.
> As usual the hard part is probably to figure out the threshold. :)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]