[
https://issues.apache.org/jira/browse/SOLR-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229939#comment-14229939
]
Shalin Shekhar Mangar commented on SOLR-6554:
---------------------------------------------
Actually, the improvements in Overseer for stateFormat=1 (the default case) is
much better than I expected. After the refactorings, the amILeader calls are
very infrequent and the speed up is about 40%:
{code}
Overseer queue size: 20000 state requests
stateFormat = 1, With refactoring (trunk)
=========================================
216071 T12 oasc.OverseerTest.testPerformance Overseer loop finished processing:
216072 T12 oasc.OverseerTest.printTimingStats totalTime: 201411.465265
216072 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
0.004964922311489345
216073 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 0.0
216073 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 0.0
216073 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest:
201411.465265
216073 T12 oasc.OverseerTest.printTimingStats medianRequestTime:
201411.465265
216073 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime:
201411.465265
216074 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime:
201411.465265
216074 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime:
201411.465265
216074 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime:
201411.465265
216075 T12 oasc.OverseerTest.testPerformance op: am_i_leader, success: 2,
failure: 0
216075 T12 oasc.OverseerTest.printTimingStats totalTime: 9.377281
216075 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
0.5969575423185497
216075 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute:
12.529098642264385
216075 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute:
19.324759776433687
216075 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest: 4.6886405
216076 T12 oasc.OverseerTest.printTimingStats medianRequestTime: 4.6886405
216076 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 9.022041
216076 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 9.022041
216076 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 9.022041
216076 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 9.022041
216077 T12 oasc.OverseerTest.testPerformance op: update_state, success: 135,
failure: 0
216077 T12 oasc.OverseerTest.printTimingStats totalTime: 61.333751
216077 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
40.31065112174398
216077 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 48.0
216078 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute:
48.0
216078 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest:
0.4543240814814815
216078 T12 oasc.OverseerTest.printTimingStats medianRequestTime: 0.364217
216078 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.409896
216078 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime:
0.9332719999999994
216079 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime:
3.576287319999995
216079 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 3.700744
216079 T12 oasc.OverseerTest.testPerformance op: state, success: 20001,
failure: 0
216081 T12 oasc.OverseerTest.printTimingStats totalTime: 13344.072646
216081 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
5973.226142698651
216081 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute:
4437.949777291698
216082 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute:
3247.958438006491
216082 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest:
0.6671702737863107
216083 T12 oasc.OverseerTest.printTimingStats medianRequestTime:
0.6112960000000001
216083 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.65861125
216083 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 0.9373918
216083 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime:
1.179823900000002
216083 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime:
6.713780613000015
stateFormat = 1, Without refactoring (branch_5x):
============================================================================================
354435 T11 oasc.OverseerTest.testPerformance Overseer loop finished processing:
354437 T11 oasc.OverseerTest.printTimingStats totalTime: 336777.887
354438 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
0.0029692955509913457
354438 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 0.0
354438 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 0.0
354439 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest: 336777.887
354439 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 336777.887
354439 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 336777.887
354440 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 336777.887
354440 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 336777.887
354440 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime:
336777.887
354441 T11 oasc.OverseerTest.testPerformance op: state, success: 20001,
failure: 0
354444 T11 oasc.OverseerTest.printTimingStats totalTime: 13029.408
354444 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
3570.0750281584515
354444 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute:
3169.209724490217
354445 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute:
2124.6849108211077
354445 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest:
0.6514378281085945
354445 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 0.59
354446 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.633
354446 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime:
0.8480999999999999
354446 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime:
0.9995200000000004
354447 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime:
1.736079000000002
354447 T11 oasc.OverseerTest.testPerformance op: update_state, success: 222,
failure: 0
354448 T11 oasc.OverseerTest.printTimingStats totalTime: 98.244
354448 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
39.622607985461286
354448 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 48.0
354448 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute:
48.0
354449 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest:
0.44254054054054054
354449 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 0.3835
354450 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.463
354450 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime:
0.7994499999999999
354450 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime:
1.2152900000000026
354451 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 2.452
354451 T11 oasc.OverseerTest.testPerformance op: am_i_leader, success: 223,
failure: 0
354452 T11 oasc.OverseerTest.printTimingStats totalTime: 43.33
354453 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute:
39.777330428482294
354453 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute:
57.7576718337744
354453 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute:
65.77963729636123
354453 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest:
0.194304932735426
354454 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 0.149
354454 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.188
354454 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime:
0.25839999999999996
354454 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime:
0.47591999999999895
354455 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 5.712
{code}
Do not compare these numbers with the last ones because this test was run on a
different box. Also trunk used jdk1.8.0_25 and branch_5x was run on
jdk1.7.0_25. I'm running the other tests and I will report back shortly.
> Speed up overseer operations for collections with stateFormat > 1
> -----------------------------------------------------------------
>
> Key: SOLR-6554
> URL: https://issues.apache.org/jira/browse/SOLR-6554
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Affects Versions: 5.0, Trunk
> Reporter: Shalin Shekhar Mangar
> Attachments: SOLR-6554-batching-refactor.patch,
> SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch,
> SOLR-6554-batching-refactor.patch, SOLR-6554.patch, SOLR-6554.patch,
> SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch,
> SOLR-6554.patch, SOLR-6554.patch
>
>
> Right now (after SOLR-5473 was committed), a node watches a collection only
> if stateFormat=1 or if that node hosts at least one core belonging to that
> collection.
> This means that a node which is the overseer operates on all collections but
> watches only a few. So any read goes directly to zookeeper which slows down
> overseer operations.
> Let's have the overseer node watch all collections always and never remove
> those watches (except when the collection itself is deleted).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]