[
https://issues.apache.org/jira/browse/SOLR-16561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640801#comment-17640801
]
Justin Sweeney commented on SOLR-16561:
---------------------------------------
I opened a similar issue a little while back that includes something like this
as well as a few other improvements related to PULL replicas that I noticed:
[https://issues.apache.org/jira/projects/SOLR/issues/SOLR-16487.]
I took a bit of a different approach on the replication frequency by factoring
in when a new searcher is actually opened as well as when a hard commit is done
since both are important in this context. You can see the different approach in
the attached patch on the linked issue which I think will work a bit better for
a variety of configurations.
> Use autoSoftCommmitMaxTime as preferred poll interval of IndexFetcher
> ---------------------------------------------------------------------
>
> Key: SOLR-16561
> URL: https://issues.apache.org/jira/browse/SOLR-16561
> Project: Solr
> Issue Type: Improvement
> Components: replication (java)
> Affects Versions: 8.8.2
> Reporter: Hang Sun
> Assignee: Shawn Heisey
> Priority: Minor
> Labels: replication-performance
> Attachments: SOLR-16561.patch
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> TLOG/PULL replicas use *IndexFetcher* to fetch segment files from leaders.
> Once new segment files are downloaded and merged into existing index, a new
> Searcher is opened so the updated data is made available to the clients. The
> poll interval is determined by following code in *ReplicateFromLeader*
> {code:java}
> if (uinfo.autoCommmitMaxTime != -1) {
> pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime/2);
> } else if (uinfo.autoSoftCommmitMaxTime != -1) {
> pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime/2);
> }{code}
>
> In a typical config for replication using TLOG/PULL replicas where data
> visibility is less important (a trade-off to avoid NRT replicas), we set a
> short commit time to persist changes and long soft-commit time to make
> changes visible.
>
> {code:java}
> <autoCommit>
> <maxTime>15000</maxTime>
> <openSearcher>false</openSearcher>
> </autoCommit>
> <autoSoftCommit>
> <maxTime>3600000</maxTime>
> </autoSoftCommit>
> {code}
>
> With about config, the poll interval will be 15/2 = 7 sec. This leads to
> frequent opening of new Searchers which causes huge impact on realtime user
> queries, especially if the new Searcher takes long time to warmup. This also
> makes changes visible on followers ahead of leaders.
> Because the polling of new segment files is more about visibility because
> TLOG replicas still get updates to tlog files via UpdateHandler (this is my
> understanding). It seems more appropriate to use *autoSoftCommmitMaxTime* as
> the poll interval.
> I would proposed change below where *autoSoftCommmitMaxTime* is chosen as
> the preferred interval. This will make the poll interval much longer and
> make the visibility order more inline with eventual consistency pattern.
>
> {code:java}
> if (uinfo.autoSoftCommmitMaxTime != -1) {
> pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime);
> } else if (uinfo.autoCommmitMaxTime != -1) {
> pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime);
> }
> {code}
> The change has been tried and showed much less impact on realtime queries.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]