[jira] [Commented] (LUCENE-6308) SpansEnum, deprecate Spans

David Smiley (JIRA) Tue, 24 Mar 2015 10:46:59 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14378250#comment-14378250
 ]


David Smiley commented on LUCENE-6308:
--------------------------------------

Without the benefit of a code review tool; I'll put my comments here:

NearSpans:
* Why 3x in cost()?  SpanNotQuery has 2x cost. Hmmm.

NearSpansOrdered:
* The payload collection in shrinkToAfterShortestMatch seems suspect.  I think 
the loop over prevSpans should only assign a new ArrayList to possiblePayload 
if it was previously null, and otherwise should call clear.  And this should 
happen wether or not prevSpans.isPayloadAvailable().  possiblePayload should be 
pluralized to more clearly indicate it’s a collection of payloads, and a 
comment could clarify we declare it out of the loop to re-use the instance.  
Heck; maybe it should be declared as a field of this spans along with 
possibleMatchPayloads set for re-use, and then always initialize them so long 
as collectPayloads==true.  And, related, I think matchPayload should be an 
ArrayList not LinkedList — I’ve seen payload iteration code that optimized for 
RandomAccess so lets return one.
* Thanks to ConjunctionDISI, it’s nice to see a net reduction in code in 
NearSpans* :-)
* Nice use of FilterSpans for NearSpansUnordered.SpansCell and elsewhere

SpanPositionCheckQuery
* hashCode & equals should be defined.  Then it’s subclasses (some of which you 
have modified) can call super.equals & super.hashCode.  Furthermore, as of 
March 4th, we needn’t add getClass().hashCode as Query.hashCode will do that.
* PositionCheckSpans does not implement asTwoPhaseIterator. Might this be a 
problem? If the caller gets the TPI, then it's positions won't be filtered by 
acceptPosition().  Perhaps FilterSpans shouldn't delegate asTwoPhaseIterator 
because of subtle bugs like this.

SpanNearQuery
* getSpans: why the removal of the 1-clause and 0-clause optimized cases?  And 
I noticed you didn’t propagate collectPayloads to NearSpansOrdered.  (The 
Unordered case doesn’t need a boolean).

SpanOrQuery 
* Spans.cost(): why not pre-compute as it was?  cost() could get called a bunch 
during sorting of queries.

SpanNotQuery
* is asTwoPhraseIterator a TODO?  I think there's value in it here.

SpanScorer:
* shouldn’t this be implementing asTwoPhaseIterator to delegate to the spans?

Overall, this is clearly going to bring some great speed-ups to complex span 
queries.  Nice job Paul!  Longer term, I hope LUCENE-2878 (nuke Spans) comes to 
fruition so that we still don't have the Query dichotomy.

> SpansEnum, deprecate Spans
> --------------------------
>
>                 Key: LUCENE-6308
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6308
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: Trunk
>            Reporter: Paul Elschot
>            Priority: Minor
>         Attachments: LUCENE-6308-changeapi.patch, 
> LUCENE-6308-changeapi.patch, LUCENE-6308-changeapi.patch, LUCENE-6308.patch, 
> LUCENE-6308.patch, LUCENE-6308.patch, LUCENE-6308.patch, LUCENE-6308.patch, 
> LUCENE-6308.patch
>
>
> An alternative for Spans that looks more like PositionsEnum and adds two 
> phase doc id iteration



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-6308) SpansEnum, deprecate Spans

Reply via email to