[ 
https://issues.apache.org/jira/browse/SOLR-8914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-8914:
---------------------------
    Attachment: SOLR-8914.patch

I wrote up a stress test to demonstrate the bug.  I've added it to the patch 
Scott already worked up & attached.

Scott: Prior to incorporating your changes, hammering on this stress test would 
fail within the first 20 attempts.  But with your changes I'm seeing deadlocks 
within the first 5 attempts every time i hammer on it...

{noformat}
Found one Java-level deadlock:
=============================
"zkCallback-7-thread-2-processing-n:127.0.0.1:48312_solr":
  waiting to lock monitor 0x00007f82d40076b8 (object 0x00000000ff3b5b38, a 
java.lang.Object),
  which is held by "zkCallback-7-thread-1-processing-n:127.0.0.1:48312_solr"
"zkCallback-7-thread-1-processing-n:127.0.0.1:48312_solr":
  waiting to lock monitor 0x00007f82d400be38 (object 0x00000000ff3b5800, a 
org.apache.solr.common.cloud.ZkStateReader),
  which is held by 
"OverseerStateUpdate-95637266046386179-127.0.0.1:48312_solr-n_0000000000"
"OverseerStateUpdate-95637266046386179-127.0.0.1:48312_solr-n_0000000000":
  waiting to lock monitor 0x00007f82d40076b8 (object 0x00000000ff3b5b38, a 
java.lang.Object),
  which is held by "zkCallback-7-thread-1-processing-n:127.0.0.1:48312_solr"
{noformat}


> ZkStateReader's refreshLiveNodes(Watcher) is not thread safe
> ------------------------------------------------------------
>
>                 Key: SOLR-8914
>                 URL: https://issues.apache.org/jira/browse/SOLR-8914
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Hoss Man
>         Attachments: SOLR-8914.patch, SOLR-8914.patch, 
> jenkins.thetaphi.de_Lucene-Solr-6.x-Solaris_32.log.txt, 
> live_node_mentions_port56361_with_threadIds.log.txt, 
> live_nodes_mentions.log.txt
>
>
> Jenkin's encountered a failure in TestTolerantUpdateProcessorCloud over the 
> weekend....
> {noformat}
> http://jenkins.thetaphi.de/job/Lucene-Solr-6.x-Solaris/32/consoleText
> Checking out Revision c46d7686643e7503304cb35dfe546bce9c6684e7 
> (refs/remotes/origin/branch_6x)
> Using Java: 64bit/jdk1.8.0 -XX:+UseCompressedOops -XX:+UseG1GC
> {noformat}
> The failure happened during the static setup of the test, when a 
> MiniSolrCloudCluster & several clients are initialized -- before any code 
> related to TolerantUpdateProcessor is ever used.
> I can't reproduce this, or really make sense of what i'm (not) seeing here in 
> the logs, so i'm filing this jira with my analysis in the hopes that someone 
> else can help make sense of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to