After upgrading to Solr 8.7 from 5.2.1 seems that asking for 'score' in 'fl' causes the query to take almost twice (or so) the time

[email protected] Tue, 17 Aug 2021 04:13:35 -0700

Hello,



In performance tests conducted on the same system, comparing its performance 
with Solr 5.2.1 verses 8.7, was found a degradation the query execution time.

The system uses a custom query parser, which constructs quite a large query 
object.

When removing the ‘score’ parameter from the ‘fl’ list, the times were back as 
in Solr 5.

Trying to trouble shoot using jstack (small bash script that take a jstack of 
the Solr thread every few milliseconds) found that the Solr 8 thread “spends” 
time in the stack: (was appearing in several jstack snippets) :

"qtp2038105753-19" #19 prio=5 os_prio=0 cpu=1837.18ms elapsed=10334.56s 
tid=0x00007f7da0973000 nid=0x6ddd runnable  [0x00007f7d6d5f2000]

   java.lang.Thread.State: RUNNABLE

        at 
sun.nio.ch.FileDispatcherImpl.pread0([email protected]/Native<mailto:[email protected]/Native<mailto:[email protected]/Native%3cmailto:[email protected]/Native>>
 Method)

        at 
sun.nio.ch.FileDispatcherImpl.pread([email protected]/FileDispatcherImpl.java:54<mailto:[email protected]/FileDispatcherImpl.java:54<mailto:[email protected]/FileDispatcherImpl.java:54%3cmailto:[email protected]/FileDispatcherImpl.java:54>>)

        at 
sun.nio.ch.IOUtil.readIntoNativeBuffer([email protected]/IOUtil.java:274<mailto:[email protected]/IOUtil.java:274<mailto:[email protected]/IOUtil.java:274%3cmailto:[email protected]/IOUtil.java:274>>)

        at 
sun.nio.ch.IOUtil.read([email protected]/IOUtil.java:245<mailto:[email protected]/IOUtil.java:245<mailto:[email protected]/IOUtil.java:245%3cmailto:[email protected]/IOUtil.java:245>>)

        at 
sun.nio.ch.FileChannelImpl.readInternal([email protected]/FileChannelImpl.java:811<mailto:[email protected]/FileChannelImpl.java:811<mailto:[email protected]/FileChannelImpl.java:811%3cmailto:[email protected]/FileChannelImpl.java:811>>)

        at 
sun.nio.ch.FileChannelImpl.read([email protected]/FileChannelImpl.java:796<mailto:[email protected]/FileChannelImpl.java:796<mailto:[email protected]/FileChannelImpl.java:796%3cmailto:[email protected]/FileChannelImpl.java:796>>)

        at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:170)

        at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:315)

        at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:254)

        at 
org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)

        at org.apache.lucene.util.fst.FST.readLabel(FST.java:596)

        at org.apache.lucene.util.fst.FST.readArc(FST.java:1259)

        at org.apache.lucene.util.fst.FST.readNextRealArc(FST.java:1247)

        at org.apache.lucene.util.fst.FST.readFirstRealTargetArc(FST.java:1091)

        at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1396)

        at 
org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:485)

        at 
org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekExact(FilterLeafReader.java:184)

        at org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:124)

        at org.apache.lucene.index.TermStates.build(TermStates.java:109)

        at org.apache.lucene.search.PhraseQuery$1.getStats(PhraseQuery.java:447)

        at org.apache.lucene.search.PhraseWeight.<init>(PhraseWeight.java:38)

        at org.apache.lucene.search.PhraseQuery$1.<init>(PhraseQuery.java:429)

        at 
org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:429)

        at org.apache.lucene.search.BoostQuery.createWeight(BoostQuery.java:125)

        at 
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)

        at org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:63)

        at 
org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)

        at 
org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)

        at 
org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:563)

        at 
org.apache.solr.search.Grouping$Command.populateScoresIfNecessary(Grouping.java:593)

        at 
org.apache.solr.search.Grouping$CommandFunc.finish(Grouping.java:1007)

        at org.apache.solr.search.Grouping.execute(Grouping.java:408)

        at 
org.apache.solr.handler.component.QueryComponent.doProcessGroupedSearch(QueryComponent.java:1486)

        at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:395)

        at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:360)



This doesn’t appear in any jstack snippet taken of the Solr 5 thread, when 
‘score’ is also asked for.

Trying to look a bit into the code, I see that in solr 8 the calculation of 
score “if needed” is performed as a separate phase, after the document 
collection flow (might be that its like collecting the documents again in a 
way? I/O etc. ) :



org.apache.solr.search.Grouping.execute():

   if (!collectors.isEmpty()) {

      Collector secondPhaseCollectors = 
MultiCollector.wrap(collectors.toArray(new Collector[collectors.size()]));

      if (collectors.size() > 0) {

        if (cachedCollector != null) {

          if (cachedCollector.isCached()) {

            cachedCollector.replay(secondPhaseCollectors);

          } else {

            signalCacheWarning = true;

            log.warn(String.format(Locale.ROOT, "The grouping cache is active, 
but not used because it exceeded the max cache limit of %d percent", 
maxDocsPercentageToCache));

            log.warn("Please increase cache size or disable group caching.");

            searchWithTimeLimiter(luceneFilter, secondPhaseCollectors);

          }

        } else {

          if (pf.postFilter != null) {

            pf.postFilter.setLastDelegate(secondPhaseCollectors);

            secondPhaseCollectors = pf.postFilter;

          }

          searchWithTimeLimiter(luceneFilter, secondPhaseCollectors); //->seems 
to me this is the main document collection flow

        }

        if (secondPhaseCollectors instanceof DelegatingCollector) {

          ((DelegatingCollector) secondPhaseCollectors).finish();

        }

      }

    }



    for (@SuppressWarnings({"rawtypes"})Command cmd : commands) {

      cmd.finish();

    }



org.apache.solr.search.Grouping.CommandFunc:

@Override

    @SuppressWarnings({"unchecked"})

    protected void finish() throws IOException {

      if (secondPass != null) {

        result = secondPass.getTopGroups(0);

        populateScoresIfNecessary();

      }





org.apache.solr.search.Grouping.Command<T>:

protected void populateScoresIfNecessary() throws IOException {

      if (needScores) {

        for (GroupDocs<?> groups : result.groups) {

          TopFieldCollector.populateScores(groups.scoreDocs, searcher, query);

        }

      }

    }





I know that asking for the score has a cost of performance, but in Solr 5 it 
didn’t seem to be meaningful and in Solr 8 it seems to be very expansive.

I tried to compare the Solr 5 code, and it’s indeed different, this code 
section doesn’t exist there is these classes.

Can anyone assist in understanding this issue?

Is this code change causing the score calculation upon request in ‘fl’ to 
perform again collection of the documents in order to add the score to them?

Can this be prevented with some kind of configuration to behave as in Solr 5?



Thank you,

After upgrading to Solr 8.7 from 5.2.1 seems that asking for 'score' in 'fl' causes the query to take almost twice (or so) the time

Reply via email to