[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

Hoss Man (JIRA) Tue, 29 Mar 2016 15:30:43 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hoss Man updated SOLR-8914:
---------------------------
    Description: 

Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
weekend....

{noformat}
http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
(refs/remotes/origin/branch_6x)
Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
{noformat}

The failure happened during the static setup of the test, when a 
MiniSolrCloudCluster & several clients are initialized -- before any code 
related to TolerantUpdateProcessor is ever used.

I can't reproduce this, or really make sense of what i'm (not) seeing here in 
the logs, so i'm filing this jira with my analysis in the hopes that someone 
else can help make sense of it.


  was:


Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
weekend....

{noformat}
http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
(refs/remotes/origin/branch_6x)
Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
{noformat}

The failure happened during the static setup of the test, when a 
MiniSolrCloudCluster & several clients are initialized -- before any code 
related to TolerantUpdateProcessor is ever used.

I can't reproduce this, or really make sense of what i'm (not) seeing here in 
the logs, so i'm filing this jira with my analysis in the hopes that someone 
else can help make sense of it.


        Summary: ZkStateReader's refreshLiveNodes(Watcher) is not thread safe  
(was: inexplicable "no servers hosting shard: shard2" using 
MiniSolrCloudCluster)

Updating summary.

I suspect we either need to move the {{zkClient.getChildren(...)}} call inside 
the existing {{synchronized (getUpdateLock())}} block, or the entire 
{{refreshLiveNodes(Watcher watcher)}} method needs to synchronize on some new 
"liveNodesLock".

> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> ------------------------------------------------------------
>
>                 Key: SOLR-8914
>                 URL: https://issues.apache.org/jira/browse/SOLR-8914
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend....
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-8914) ZkStateReader's refreshLiveNodes(Watcher) is not thread safe

Reply via email to