[ 
https://issues.apache.org/jira/browse/HBASE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14908709#comment-14908709
 ] 

Enis Soztutar commented on HBASE-14474:
---------------------------------------

bq. If I read the code correctly, the call to shouldCloseConnection.set(true) 
around line 919 would result in markClosed() outside the synchronized block 
returning false because shouldCloseConnection is true by that time.
Right. But we cannot skip calling shouldCloseConnection in that synchronized 
block I think. The problem is that writeRequest() is called by different 
threads and the logic there corresponding to this comment: 
{code}
          // We set the value inside the synchronized block, this way the next 
in line
          //  won't even try to write. Otherwise we might miss a call in the 
calls map?
{code}
means that the assumption is that before giving the outLock, we have to ensure 
that no more calls can be added to the {{calls}} map. 

> DeadLock in RpcClientImpl.Connection.close() 
> ---------------------------------------------
>
>                 Key: HBASE-14474
>                 URL: https://issues.apache.org/jira/browse/HBASE-14474
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
>         Attachments: 14474-addendum.txt, hbase-14474_v1.patch, 
> hbase-14474_v2.patch, hbase-14474_v3.patch
>
>
> From a code base that contains 1.1.2 + HBASE-14449 + HBASE-14241 and 
> HBASE-14313 we can reproduce a dead lock with serverKilling CM easily: 
> {code}
> Found one Java-level deadlock:
> =============================
> "htable-pool1-t63":
>   waiting to lock monitor 0x0000000001cb1688 (object 0x00000000806ef150, a 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection),
>   which is held by "IPC Client (1403704789) connection to 
> enis-hbase-sep-21-6.novalocal/172.22.107.106:16020 from root"
> "IPC Client (1403704789) connection to 
> enis-hbase-sep-21-6.novalocal/172.22.107.106:16020 from root":
>   waiting to lock monitor 0x0000000001cb1738 (object 0x00000000806f0c60, a 
> java.lang.Object),
>   which is held by "htable-pool1-t63"
> Java stack information for the threads listed above:
> ===================================================
> "htable-pool1-t63":
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:819)
>       - waiting to lock <0x00000000806ef150> (a 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>       - locked <0x00000000806f0c60> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1192)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:32699)
>       at 
> org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:129)
>       at 
> org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:54)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>       at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:708)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> "IPC Client (1403704789) connection to 
> enis-hbase-sep-21-6.novalocal/172.22.107.106:16020 from root":
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:832)
>       - waiting to lock <0x00000000806f0c60> (a java.lang.Object)
>       - locked <0x00000000806ef150> (a 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:574)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to