Patson Luk created SOLR-16560:
---------------------------------

             Summary: Pull replication appears to consume excessive CPU as it 
does not use Segment Reader pooling
                 Key: SOLR-16560
                 URL: https://issues.apache.org/jira/browse/SOLR-16560
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 8.8, main (10.0)
            Reporter: Patson Luk


While we are experimenting with adding PULL replica to our solr cluster, it's 
found from profiling that IndexFetcher seems to impose much more CPU overhead 
than anticipated , and most of those CPU time are spent in `SegmentReader.init` 
which is called by `SolrCore.openNewSearch` from the `IndexFetcher`.

With some debugging, it's found that for every replication on an updated 
collection, a new `SegmentReader` is created for every segment for such 
collection (not only the ones that are pulled down), compared to 
`SolrCore.openNewSearch` triggered from regular commit, which ONLY creates 
`SegmentReader` for new segments, which old `SegmentReader`s are obtained from 
`ReaderPool`. 

Unfortunately, such pool does not work for `IndexFetcher` as it opens a new 
`IndexWriter` on every run at 
[here|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L749],
 which creates a new `ReaderPool`.

I am not familiar enough to tell whether such new `IndexWriter` is always 
needed, but opening `SegmentReader` for every segment of a collection seems 
excessive for pull replication.

Any thoughts on this please? Many thanks!! :)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to