[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

David Smiley (JIRA) Fri, 30 Nov 2012 09:32:00 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13507476#comment-13507476
 ]


David Smiley commented on LUCENE-4574:
--------------------------------------

Indeed this is complicated.  It's par for the course in the part of Lucene I 
figure.

bq. Is there some way we could invert this (e.g. so its boolean collectsScores 
or something?).

I hear ya; it was getting late and when I originally named the variable it was 
appropriate since it was only for the nonScoring collectors.

RE ScoringTwiceCollector, I'll just trust you that this approach makes sense as 
I'm not familiar with the distinguishing details between the half-dozen 
collectors to know that ScoringTwiceCollector can always be used in leu of 
those.

I applied your patch.  Then I applied the part of my patch for 
RelevanceComparator to detect when a score for the same doc is fetched twice in 
a row.  I recognized a small bug there in which I forgot to re-initialize 
_lastDocId to -1 on setScorer(), so I fixed that trivially.

But TestShardSearching.testSimple() still fails.  So I took a closer at the 
Collector calling it to see why.  OneComparatorNonScoringCollector.collect() 
opens up with this:
{code:java}
    public void collect(int doc) throws IOException {
      ++totalHits;
      if (queueFull) {
        if ((reverseMul * comparator.compareBottom(doc)) <= 0) {
          // since docs are visited in doc Id order, if compare is 0, it means
          // this document is larger than anything else in the queue, and
          // therefore not competitive.
          return;
        }
        
        // This hit is competitive - replace bottom element in queue & adjustTop
        comparator.copy(bottom.slot, doc);
...
{code} 

Notice that there is a call to comparator.compareBottom() and comparator.copy() 
here.  Both of these for RelevanceComparator fetch the score.

So maybe RelevanceComparator.setScorer still needs to wrap its scorer with the 
caching one for such cases.

So why do you hate this very simple cache so much?  Figuring out exactly when 
to do it but not otherwise has the adverse effect of complicating further 
already complicated code and as a consequence increases the risk that at some 
point after future changes the conditions become wrong and triggers a query to 
take twice as long.  But this cache is so light-weight that it is probably too 
hard to measure the appreciable difference of doing unnecessary caching.
                
> FunctionQuery ValueSource value computed twice per document
> -----------------------------------------------------------
>
>                 Key: LUCENE-4574
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4574
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.0, 4.1
>            Reporter: David Smiley
>            Assignee: David Smiley
>         Attachments: LUCENE-4574.patch, LUCENE-4574.patch, LUCENE-4574.patch, 
> LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
>         at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
>         at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
>         at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
>         at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
>         at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
>         at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
>         at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
>         at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
>         at org.apache.lucene.search.Scorer.score(Scorer.java:62)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
>         at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

Reply via email to