[
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089118#comment-14089118
]
Ramkumar Aiyengar commented on SOLR-6261:
-----------------------------------------
Thinking this through further, the problem comes down to the fact that
{{peekTopN}} leaks (potentially multiple) watches as it creates one with each
{{getChildren}} and throws it away when it has results or a timeout is reached
on an empty queue, and currently there's no way to remove watches from Zk
currently (hopefully something to look forward to when 3.5 gets released and
ZOOKEEPER-1829 gets in). Put this call in an event loop and you have a big
issue.
I guess we do need timeouts to check leadership every now and then, but peek
should at that point have a way to see if it had created a watch before and
wait on it instead of looking for children again.
> Run ZK watch event callbacks in parallel to the event thread
> ------------------------------------------------------------
>
> Key: SOLR-6261
> URL: https://issues.apache.org/jira/browse/SOLR-6261
> Project: Solr
> Issue Type: Improvement
> Components: SolrCloud
> Affects Versions: 4.9
> Reporter: Ramkumar Aiyengar
> Assignee: Mark Miller
> Priority: Minor
> Fix For: 5.0, 4.10
>
>
> Currently checking for leadership (due to the leader's ephemeral node going
> away) happens in ZK's event thread. If there are many cores and all of them
> are due leadership, then they would have to serially go through the two-way
> sync and leadership takeover.
> For tens of cores, this could mean 30-40s without leadership before the last
> in the list even gets to start the leadership process. If the leadership
> process happens in a separate thread, then the cores could all take over in
> parallel.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]