[
https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887230#action_12887230
]
Michael McCandless commented on LUCENE-2130:
--------------------------------------------
I think the slowness comes from a particularly bad interaction b/w how
FuzzyTermsEnum uses MultiTermsEnum.seek during the rewrite, and MTE's impl of
seek.
EG if you seek MTE to term Naa. Say there are 4 segments, that seek to these
terms: Nab M O P.
If you next seek to term Nac, then the 3 segments (M O P above) will do tons of
work (a full re-seek, ie binary search through the index, re-scan) only to
discover they are still at M O P. There's an O(N^2) factor at play, where N =
term index interval, and the constant factor is highish (decoding each term
entry on scan).
I think we could conceivably fix MTE to better handle this "seeking a bit
forward at a time" case, eg by keeping track of last seek term and not touching
the M O P segments since we know they'll just seek back to the same point, I
suspect if we do so the perf will still be sizably slower than if we can
rewrite per-segment.
> Investigate Rewriting Constant Scoring MultiTermQueries per segment
> -------------------------------------------------------------------
>
> Key: LUCENE-2130
> URL: https://issues.apache.org/jira/browse/LUCENE-2130
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Mark Miller
> Fix For: 4.0
>
> Attachments: LUCENE-2130.patch, LUCENE-2130.patch
>
>
> This issue is likely not to go anywhere, but I thought we might explore it.
> The only idea I have come up with is fairly ugly, and unless something better
> comes up, this is not likely to happen.
> But if we could rewrite constant score multi-term queries per segment, MTQ's
> with auto (when the heuristic doesnt cut over to constant filter), or
> constant boolean rewrite could enum terms against a single segment and then
> apply a boolean query against each segment with just the terms that are known
> to be in that segment. This also allows you to avoid DirectoryReaders
> MultiTermEnum and its PQ. (See Roberts comment below).
> No biggie, not likely, but what the heck.
> So the ugly way to do it is to add a property to query's and weights -
> lateCnstRewrite or something, that defaults to false. MTQ would return true
> if its in a constant score mode. On the top level rewrite, if this is
> detected, an empty ConstantScoreQuery is made, and its Weight is turned to
> lateCnstRewrite and it keeps a ref to the original MTQ query. It also gets
> its boost set to the MTQ's boost. Then when we are searching per segment, if
> the Weight is lateCnstRewrite, we grab the orig query and actually do the
> rewrite against the subreader and grab the actual constantscore weight. It
> works I think - but its a little ugly.
> Not sure its worth the baggage for the win - but perhaps the objective can be
> met in another way.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]