[ 
https://issues.apache.org/jira/browse/SOLR-15045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17413463#comment-17413463
 ] 

Mark Robert Miller commented on SOLR-15045:
-------------------------------------------

Oh wow, I think you are right that finish will not wait for commits from other 
threads. It has been a very long time since I’ve looked at a SolrCmdDistributor 
that didn’t just wait for outstanding request in finish. That 
StreamingSolrServer was a terrible shiv to start with there. 

But if you are right, I still don’t see how calling block and do retries there 
solves anything. If it worked there, it would work when it’s called in finish, 
and the user would not get a response until all commits were done.

As far as I can tell, either the commits go through the same queue and runner 
threads that all the updates go through, in which case, I still don’t see why 
you’d need that - block and do retries is called in finish. Or commits go 
through a standard request on streaming Solr server somehow vía a std http solr 
client style request - in which case block and do retries wouldn’t help 
anywhere. It’s called at the end of the request though, so I still see how 
calling it there also is necessary. Maybe I’m missing something - I don’t keep 
up with that code on main. 

Finish and doFinish should be waiting for anything outstanding though.

I suppose what might be getting solved is the case that the commit goes out, 
you don’t wait, docs go out, different threads on the streaming server get the 
updates, the thread with the commit stalls, the commit gets behind an update. 
It used to be single threaded, but if I remember right, that’s configurable 
now. A bit wild if you ask me, reorders of docs are not great for cloud and it 
means n threads per request and way more threads than CPU’s starts costing … 
but anyway,, as you said, a bit tangential to your issue. An async standard 
client fits this api when the streaming server does not, so I wouldn’t chase 
unrelated improvements here anyway.

In regards to the issue you are after, no I did not test anything, but I think 
I recall reviewing and confirming the issue does exist and the solution does 
address it. 

> 2x latency of synchronous commits due to serial execution on local and 
> distributed leaders
> ------------------------------------------------------------------------------------------
>
>                 Key: SOLR-15045
>                 URL: https://issues.apache.org/jira/browse/SOLR-15045
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 8.5.2
>         Environment: Operating system: Linux (centos 7.7.1908)
>            Reporter: Raj Yadav
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Hi All,
> When we issue commit through curl command, not all the shards are getting 
> `start commit` requests at the same time.
> *Solr Setup Detail : (Running in solrCloud mode)*
>  It has 6 shards, and each shard has only one replica (which is also a
>  leader) and the replica type is NRT.
>  Each shards are hosted on the separate physical host.
> Zookeeper => We are using external zookeeper ensemble (3 separate node
>  cluster)
> *Shard and Host name*
>  shard1_0=>solr_199
>  shard1_1=>solr_200
>  shard2_0=> solr_254
>  shard2_1=> solr_132
>  shard3_0=>solr_133
>  shard3_1=>solr_198
> *Request rate on the system is currently zero and only hourly indexing*
>  *running on it.*
> We are using curl command to issue commit.
> {code:java}
> curl
> "http://solr_254:8389/solr/my_collection/update?openSearcher=true&commit=true&wt=json"{code}
> (Using solr_254 host to issue commit)
> On using the above command all the shards have started processing commit (i.e
>  getting `start commit` request) except the one used in curl command (i.e
>  shard2_0 which is hosted on solr_254). Individually each shards takes around
>  10 to 12 min to process hard commit (most of this time is spent on reloading
>  external files).
>  As per logs, shard2_0 is getting `start commit` request after 10 minutes
>  (approx). This leads to following timeout error.
> {code:java}
> 2020-12-06 18:47:47.013 ERROR
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at:
> http://solr_132:9744/solr/my_collection_shard2_1_replica_n21/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2Fsolr_254%3A9744%2Fsolr%2Fmy_collection_shard2_0_replica_n11%2F
>       at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:407)
>       at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:753)
>       at
> org.apache.solr.client.solrj.impl.ConcurrentUpdateHttp2SolrClient.request(ConcurrentUpdateHttp2SolrClient.java:369)
>       at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290)
>       at
> org.apache.solr.update.SolrCmdDistributor.doRequest(SolrCmdDistributor.java:344)
>       at
> org.apache.solr.update.SolrCmdDistributor.lambda$submit$0(SolrCmdDistributor.java:333)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:180)
>       at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>       at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
>     Caused by: java.util.concurrent.TimeoutException
>       at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
>       at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:398)
>       ... 13 more{code}
> Above timeout error is between solr_254 and solr_132. Similar errors are
>  there between solr_254 and other 4 shards
> Since query load is zero, mostly CPU utilization is around 3%.
>  After issuing curl commit command, CPU goes up to 14% on all shards except
>  shard2_0 (host: solr_254, the one used in curl command).
>  And after 10 minutes (i.e after getting the `start commit` request)  CPU  on
>  shard2_0 also goes up to 14%.
> As I mentioned earlier each shards take around 10-12 mins to process commit
>  and due to delay in starting commit process on one shard (shard2_0) our
>  overall commit time is doubled now. (22-24 minutes approx).
> *We are observing this delay in both hard and soft commit.*
> In our solr-5.4.0(having similar setup), we use the similar curl command to 
> issue commit, and there all the shards are getting `start commit` request at 
> same time. Including the one used in curl command.
>  
> *Impact After deleting external files:*
> In order to nullify the impact of external files, I had deleted external
> files from all the shards and issued commit through the curl command. Commit
> operation got completed in 3 seconds. Individual shards took 1.5 seconds to
> complete the commit operation. But there was a delay of around 1.5 seconds
> on the shard whose hostname was used to issue the commit. Hence overall
> commit time is 3 seconds.
> During this operation, there was no timeout or any other kind of error
> (except `external file not found` error which is expected).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to