[
https://issues.apache.org/jira/browse/LUCENE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14379227#comment-14379227
]
David Smiley commented on LUCENE-6308:
--------------------------------------
The pain of not using _any_ review tool (no matter how imperfect any might be)
is quite apparent to me with these patch updates and multi-user interaction.
Let the pain continue? Rob; you could ignore the RB if so long as if the
patches are here for you to do the perf testing. I don't mean to suggest we
should use a review tool for every JIRA issue.
bq. SpanNear: when the subSpans are initialized and one of them is null, null
is returned for the NearSpans... . I think SpanNear should have at least two
subqueries, I did not expect that to break backward compatibility. The single
subquery case could be restored, but is it really needed? When this happens the
caller should be aware that the slop is going to be ignored, and in the ordered
case there should be no match at all because the subSpans should not overlap
then.
I simply recognized it as a difference and wondered why. If some thing
"should" not happen (you claim SpanNear should have at least 2 sub-queries)
then an assertion or comment to that effect would be good. As to the
particulars of this case, I'm not sure at the moment without further inspection.
bq. SpanNotQuery and SpanScorer: as I understood there is no need for these two
implement TwoPhaseIterator because that is only needed when there is some form
of conjunction. SpanScorer only takes a single Spans, and for SpanNot advance()
on the excluded spans only is good enough.
I believe it's useful for any aggregation, even over one so that the TPI
benefits percolate across parent/child queries in aggregate. [~jpountz] please
check me on this. Having said this I don't know why TPIs aren't propagated
more generally; maybe a TODO?.
On to my example: If any Query that wraps another Query (technically a Scorer
wrapping another Scorer, or Span wrapping another Span here) fails to propagate
the TwoPhaseIterator, then the benefit of the TPI is only localized to the
nodes that directly support it (e.g. BooleanQuery & SpanNearQuery). Consider
the case of a top level BooleanQuery wrapping two MUST clauses, one clause with
a medium-frequency term, and the other clause being a SpanNearQuery between
some other SpanQuery derivatives which include a low frequency term. With your
patch's SpanScorer _not_ supporting TPI, the top level BQ won't see a TPI so it
will drive the query by the low-cost SpanQuery which will force it to visit
positions for documents that could have been filtered by the top level term
query's document list.
bq. Removing Spans/intervals completely is not an option: with (nested)
proximity queries there is no way to avoid iterating over intervals/spans. We
can make the scoring consistent between Spans and PhraseQuery, and PhraseQuery
with getSpans() would also be good for performance of common cases.
I could have been clearer. I simply mean I look forward to the _dichotomy_
being gone or made easier. That is, today, a SpanNearQuery only accepts
SpanQueries, which forces us to convert other queries to them (even TermQuery),
which is a pain. Any way, that discussion should be left to LUCENE-2878 so I
shouldn't have brought it up here.
> SpansEnum, deprecate Spans
> --------------------------
>
> Key: LUCENE-6308
> URL: https://issues.apache.org/jira/browse/LUCENE-6308
> Project: Lucene - Core
> Issue Type: Bug
> Components: core/search
> Affects Versions: Trunk
> Reporter: Paul Elschot
> Priority: Minor
> Attachments: LUCENE-6308-changeapi.patch,
> LUCENE-6308-changeapi.patch, LUCENE-6308-changeapi.patch,
> LUCENE-6308-changeapi.patch, LUCENE-6308.patch, LUCENE-6308.patch,
> LUCENE-6308.patch, LUCENE-6308.patch, LUCENE-6308.patch, LUCENE-6308.patch
>
>
> An alternative for Spans that looks more like PositionsEnum and adds two
> phase doc id iteration
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]