[ https://issues.apache.org/jira/browse/SOLR-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14229939#comment-14229939 ]
Shalin Shekhar Mangar commented on SOLR-6554: --------------------------------------------- Actually, the improvements in Overseer for stateFormat=1 (the default case) is much better than I expected. After the refactorings, the amILeader calls are very infrequent and the speed up is about 40%: {code} Overseer queue size: 20000 state requests stateFormat = 1, With refactoring (trunk) ========================================= 216071 T12 oasc.OverseerTest.testPerformance Overseer loop finished processing: 216072 T12 oasc.OverseerTest.printTimingStats totalTime: 201411.465265 216072 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 0.004964922311489345 216073 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 0.0 216073 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 0.0 216073 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest: 201411.465265 216073 T12 oasc.OverseerTest.printTimingStats medianRequestTime: 201411.465265 216073 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 201411.465265 216074 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 201411.465265 216074 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 201411.465265 216074 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 201411.465265 216075 T12 oasc.OverseerTest.testPerformance op: am_i_leader, success: 2, failure: 0 216075 T12 oasc.OverseerTest.printTimingStats totalTime: 9.377281 216075 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 0.5969575423185497 216075 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 12.529098642264385 216075 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 19.324759776433687 216075 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest: 4.6886405 216076 T12 oasc.OverseerTest.printTimingStats medianRequestTime: 4.6886405 216076 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 9.022041 216076 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 9.022041 216076 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 9.022041 216076 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 9.022041 216077 T12 oasc.OverseerTest.testPerformance op: update_state, success: 135, failure: 0 216077 T12 oasc.OverseerTest.printTimingStats totalTime: 61.333751 216077 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 40.31065112174398 216077 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 48.0 216078 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 48.0 216078 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest: 0.4543240814814815 216078 T12 oasc.OverseerTest.printTimingStats medianRequestTime: 0.364217 216078 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.409896 216078 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 0.9332719999999994 216079 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 3.576287319999995 216079 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 3.700744 216079 T12 oasc.OverseerTest.testPerformance op: state, success: 20001, failure: 0 216081 T12 oasc.OverseerTest.printTimingStats totalTime: 13344.072646 216081 T12 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 5973.226142698651 216081 T12 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 4437.949777291698 216082 T12 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 3247.958438006491 216082 T12 oasc.OverseerTest.printTimingStats avgTimePerRequest: 0.6671702737863107 216083 T12 oasc.OverseerTest.printTimingStats medianRequestTime: 0.6112960000000001 216083 T12 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.65861125 216083 T12 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 0.9373918 216083 T12 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 1.179823900000002 216083 T12 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 6.713780613000015 stateFormat = 1, Without refactoring (branch_5x): ============================================================================================ 354435 T11 oasc.OverseerTest.testPerformance Overseer loop finished processing: 354437 T11 oasc.OverseerTest.printTimingStats totalTime: 336777.887 354438 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 0.0029692955509913457 354438 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 0.0 354438 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 0.0 354439 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest: 336777.887 354439 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 336777.887 354439 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 336777.887 354440 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 336777.887 354440 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 336777.887 354440 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 336777.887 354441 T11 oasc.OverseerTest.testPerformance op: state, success: 20001, failure: 0 354444 T11 oasc.OverseerTest.printTimingStats totalTime: 13029.408 354444 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 3570.0750281584515 354444 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 3169.209724490217 354445 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 2124.6849108211077 354445 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest: 0.6514378281085945 354445 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 0.59 354446 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.633 354446 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 0.8480999999999999 354446 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 0.9995200000000004 354447 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 1.736079000000002 354447 T11 oasc.OverseerTest.testPerformance op: update_state, success: 222, failure: 0 354448 T11 oasc.OverseerTest.printTimingStats totalTime: 98.244 354448 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 39.622607985461286 354448 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 48.0 354448 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 48.0 354449 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest: 0.44254054054054054 354449 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 0.3835 354450 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.463 354450 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 0.7994499999999999 354450 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 1.2152900000000026 354451 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 2.452 354451 T11 oasc.OverseerTest.testPerformance op: am_i_leader, success: 223, failure: 0 354452 T11 oasc.OverseerTest.printTimingStats totalTime: 43.33 354453 T11 oasc.OverseerTest.printTimingStats avgRequestsPerMinute: 39.777330428482294 354453 T11 oasc.OverseerTest.printTimingStats 5minRateRequestsPerMinute: 57.7576718337744 354453 T11 oasc.OverseerTest.printTimingStats 15minRateRequestsPerMinute: 65.77963729636123 354453 T11 oasc.OverseerTest.printTimingStats avgTimePerRequest: 0.194304932735426 354454 T11 oasc.OverseerTest.printTimingStats medianRequestTime: 0.149 354454 T11 oasc.OverseerTest.printTimingStats 75thPctlRequestTime: 0.188 354454 T11 oasc.OverseerTest.printTimingStats 95thPctlRequestTime: 0.25839999999999996 354454 T11 oasc.OverseerTest.printTimingStats 99thPctlRequestTime: 0.47591999999999895 354455 T11 oasc.OverseerTest.printTimingStats 999thPctlRequestTime: 5.712 {code} Do not compare these numbers with the last ones because this test was run on a different box. Also trunk used jdk1.8.0_25 and branch_5x was run on jdk1.7.0_25. I'm running the other tests and I will report back shortly. > Speed up overseer operations for collections with stateFormat > 1 > ----------------------------------------------------------------- > > Key: SOLR-6554 > URL: https://issues.apache.org/jira/browse/SOLR-6554 > Project: Solr > Issue Type: Improvement > Components: SolrCloud > Affects Versions: 5.0, Trunk > Reporter: Shalin Shekhar Mangar > Attachments: SOLR-6554-batching-refactor.patch, > SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, > SOLR-6554-batching-refactor.patch, SOLR-6554.patch, SOLR-6554.patch, > SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, > SOLR-6554.patch, SOLR-6554.patch > > > Right now (after SOLR-5473 was committed), a node watches a collection only > if stateFormat=1 or if that node hosts at least one core belonging to that > collection. > This means that a node which is the overseer operates on all collections but > watches only a few. So any read goes directly to zookeeper which slows down > overseer operations. > Let's have the overseer node watch all collections always and never remove > those watches (except when the collection itself is deleted). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org