[
https://issues.apache.org/jira/browse/SOLR-15586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17400988#comment-17400988
]
Yael Mushinsky commented on SOLR-15586:
---------------------------------------
Tried to remove the code colored in green with small patch:
org.apache.solr.search.Grouping.CommandFunc:
@Override
@SuppressWarnings(\{"unchecked"})
protected void finish() throws IOException {
if (secondPass != null)
{ result = secondPass.getTopGroups(0);
{color:#00875a}*populateScoresIfNecessary();* {color}
}
I get the scores and everything seems working ... what is it's purpose?
> After upgrading to Solr 8.7 from 5.2.1 seems that asking for 'score' in 'fl'
> causes the query to take almost twice (or so) the time (regression?)
> -------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-15586
> URL: https://issues.apache.org/jira/browse/SOLR-15586
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Affects Versions: 8.7
> Reporter: Yael Mushinsky
> Priority: Major
>
> In performance tests conducted on the same system, comparing its performance
> with Solr 5.2.1 verses 8.7, was found a degradation in the query execution
> time.
> The system uses a custom query parser, which constructs quite a large query
> object.
> When removing the ‘score’ parameter from the ‘fl’ list, the times were back
> as in Solr 5.
> Trying to trouble shoot using jstack (small bash script that take a jstack of
> the Solr thread every few milliseconds) found that the Solr 8 thread “spends”
> time in the stack: (was appearing in several jstack snippets)
> "qtp2038105753-19" #19 prio=5 os_prio=0 cpu=1837.18ms elapsed=10334.56s
> tid=0x00007f7da0973000 nid=0x6ddd runnable [0x00007f7d6d5f2000]
> java.lang.Thread.State: RUNNABLE
> at
> sun.nio.ch.FileDispatcherImpl.pread0([[email protected]/Native|mailto:[email protected]/Native%3cmailto:[email protected]/Native]
> Method)
> at
> sun.nio.ch.FileDispatcherImpl.pread([[email protected]/FileDispatcherImpl.java:54|mailto:[email protected]/FileDispatcherImpl.java:54%3cmailto:[email protected]/FileDispatcherImpl.java:54])
> at
> sun.nio.ch.IOUtil.readIntoNativeBuffer([[email protected]/IOUtil.java:274|mailto:[email protected]/IOUtil.java:274%3cmailto:[email protected]/IOUtil.java:274])
> at
> sun.nio.ch.IOUtil.read([[email protected]/IOUtil.java:245|mailto:[email protected]/IOUtil.java:245%3cmailto:[email protected]/IOUtil.java:245])
> at
> sun.nio.ch.FileChannelImpl.readInternal([[email protected]/FileChannelImpl.java:811|mailto:[email protected]/FileChannelImpl.java:811%3cmailto:[email protected]/FileChannelImpl.java:811])
> at
> sun.nio.ch.FileChannelImpl.read([[email protected]/FileChannelImpl.java:796|mailto:[email protected]/FileChannelImpl.java:796%3cmailto:[email protected]/FileChannelImpl.java:796])
> at
> org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:170)
> at
> org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:315)
> at
> org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:254)
> at
> org.apache.lucene.util.fst.ReverseRandomAccessReader.readByte(ReverseRandomAccessReader.java:33)
> at org.apache.lucene.util.fst.FST.readLabel(FST.java:596)
> at org.apache.lucene.util.fst.FST.readArc(FST.java:1259)
> at org.apache.lucene.util.fst.FST.readNextRealArc(FST.java:1247)
> at
> org.apache.lucene.util.fst.FST.readFirstRealTargetArc(FST.java:1091)
> at org.apache.lucene.util.fst.FST.findTargetArc(FST.java:1396)
> at
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:485)
> at
> org.apache.lucene.index.FilterLeafReader$FilterTermsEnum.seekExact(FilterLeafReader.java:184)
> at
> org.apache.lucene.index.TermStates.loadTermsEnum(TermStates.java:124)
> at org.apache.lucene.index.TermStates.build(TermStates.java:109)
> at
> org.apache.lucene.search.PhraseQuery$1.getStats(PhraseQuery.java:447)
> at org.apache.lucene.search.PhraseWeight.<init>(PhraseWeight.java:38)
> at org.apache.lucene.search.PhraseQuery$1.<init>(PhraseQuery.java:429)
> at
> org.apache.lucene.search.PhraseQuery.createWeight(PhraseQuery.java:429)
> at
> org.apache.lucene.search.BoostQuery.createWeight(BoostQuery.java:125)
> at
> org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
> at
> org.apache.lucene.search.BooleanWeight.<init>(BooleanWeight.java:63)
> at
> org.apache.lucene.search.BooleanQuery.createWeight(BooleanQuery.java:231)
> at
> org.apache.lucene.search.IndexSearcher.createWeight(IndexSearcher.java:726)
> at
> org.apache.lucene.search.TopFieldCollector.populateScores(TopFieldCollector.java:563)
> at
> org.apache.solr.search.Grouping$Command.populateScoresIfNecessary(Grouping.java:593)
> at
> org.apache.solr.search.Grouping$CommandFunc.finish(Grouping.java:1007)
> at org.apache.solr.search.Grouping.execute(Grouping.java:408)
> at
> org.apache.solr.handler.component.QueryComponent.doProcessGroupedSearch(QueryComponent.java:1486)
> at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:395)
> at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:360)
>
> This doesn’t appear in any jstack snippet taken of the Solr 5 thread, when
> ‘score’ is also asked for.
> Trying to look a bit into the code, I see that in solr 8 the calculation of
> score “if needed” is performed as a separate phase, after the document
> collection flow (might be that its like collecting the documents again in a
> way? I/O etc. ) :
>
> org.apache.solr.search.Grouping.execute():
> if (!collectors.isEmpty()) {
> Collector secondPhaseCollectors =
> MultiCollector.wrap(collectors.toArray(new Collector[collectors.size()]));
> if (collectors.size() > 0) {
> if (cachedCollector != null) {
> if (cachedCollector.isCached()) {
> cachedCollector.replay(secondPhaseCollectors);
> } else {
> signalCacheWarning = true;
> log.warn(String.format(Locale.ROOT, "The grouping cache is
> active, but not used because it exceeded the max cache limit of %d percent",
> maxDocsPercentageToCache));
> log.warn("Please increase cache size or disable group caching.");
> searchWithTimeLimiter(luceneFilter, secondPhaseCollectors);
> }
> } else {
> if (pf.postFilter != null) {
> pf.postFilter.setLastDelegate(secondPhaseCollectors);
> secondPhaseCollectors = pf.postFilter;
> }
> searchWithTimeLimiter(luceneFilter, secondPhaseCollectors);
> //->seems to me this is the main document collection flow
> }
> if (secondPhaseCollectors instanceof DelegatingCollector) {
> ((DelegatingCollector) secondPhaseCollectors).finish();
> }
> }
> }
>
> for (@SuppressWarnings(\{"rawtypes"})Command cmd : commands) {
> cmd.finish();
> }
>
> org.apache.solr.search.Grouping.CommandFunc:
> @Override
> @SuppressWarnings(\{"unchecked"})
> protected void finish() throws IOException {
> if (secondPass != null) {
> result = secondPass.getTopGroups(0);
> populateScoresIfNecessary();
> }
>
>
> org.apache.solr.search.Grouping.Command<T>:
> protected void populateScoresIfNecessary() throws IOException {
> if (needScores) {
> for (GroupDocs<?> groups : result.groups) {
> TopFieldCollector.populateScores(groups.scoreDocs, searcher, query);
> }
> }
> }
>
>
> I know that asking for the score has a cost of performance, but in Solr 5 it
> didn’t seem to be meaningful and in Solr 8 it seems to be very expansive.
> (In means of code, it seems the code is Solr 5 is different, I didn’t get
> into finding exactly how and where it calculates the score to add to the
> document upon request, but in means of timing, it doesn’t seem to have such
> an effect.)
> Tried asking about this over the users mailing list, but didn't get an answer
> ...
> Can you please assist in understanding the change in performance?
> Thank you.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]