[
https://issues.apache.org/jira/browse/SOLR-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Erick Erickson resolved SOLR-7571.
----------------------------------
Resolution: Duplicate
SOLR-7344 is a much better approach, Solr should survive ill-mannered clients.
> Return metrics with update requests to allow clients to self-throttle
> ---------------------------------------------------------------------
>
> Key: SOLR-7571
> URL: https://issues.apache.org/jira/browse/SOLR-7571
> Project: Solr
> Issue Type: Improvement
> Affects Versions: 4.10.3
> Reporter: Erick Erickson
> Assignee: Erick Erickson
>
> I've assigned this to myself to keep track of it, anyone who wants please
> feel free to take this.
> I've recently seen a setup with 10 shards and 4 replicas. The SolrJ client
> (and post.jar for json files for that matter) firehose updates (150 separate
> threads in total) at Solr. Eventually, replicas (not leaders) go into
> recovery and the state cascades and eventually the entire cluster becomes
> unusable. SOLR-5850 delays the behavior, but it still occurs. There are no
> errors in the follower's logs this is leader-initiated-recovery because of a
> timeout.
> I think the root problem is that the client is just sending too many requests
> to the cluster, and ConcurrentUpdateSolrClient/Server (used by the leader to
> distribute update requests to all the followers) (this was observed in Solr
> 4.10.3+). I see thread counts of 500+ when this happens.
> So assuming that this is the root cause, the obvious "cure" is "don't index
> that fast". This is unsatisfactory since "that fast" is variable, the only
> recourse is to set that threshold low enough that the Solr cluster isn't
> being driven as fast is it can be.
> We should provide some mechanism for having the client throttle itself. The
> number of outstanding update threads is one possibility. The client could
> then slow down sending updates to Solr.
> I'm not sure there's a good way to deal with this on the server. Once the
> timeout is encountered, you don't know whether the doc has actually been
> indexed on the follower (actually, in this case it _is_ indexed, it just take
> a while). Ideally we'd just manage it all magically, but an alternative to
> let clients dynamically throttle themselves seems do-able.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]