[ 
https://issues.apache.org/jira/browse/HBASE-14474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906692#comment-14906692
 ] 

stack commented on HBASE-14474:
-------------------------------

That is pretty bad if the jvm itself can detect the deadlock. Good one [~enis]

Looks like lots of cooks recently in the close of a connection. I said I'd 
write a test over in HBASE-14313 but didn't (all talk!). We need one it looks 
like given all the machinations around close that have gone in w/o test?

HBASE-14313 addendum moved the close back in under the lock.... I missed that 
in review (should have had a broader context looking at the patch)... but it 
ran conditionally. Then HBASE-14449 changed it so we always ran the close code 
whether it had been run already or not which made the lock up more likely.

The original Jurriaan Mous code didn't do the close....

900               // We set the value inside the synchronized block, this way 
the next in line
901               //  won't even try to write
902               shouldCloseConnection.set(true);
903               writeException = e;

On your patch,

917               //  won't even try to write. Otherwise we might miss a call 
in the calls map?
918               shouldCloseConnection.set(true);

... you don't want to call markClose instead of the above?

Agree with the undoings of HBASE-14449.

Patch looks good.

[~eclark] Why you not seeing this lockup?  Because you don't have HBASE-14449?

Nice one [~enis] 

> DeadLock in RpcClientImpl.Connection.close() 
> ---------------------------------------------
>
>                 Key: HBASE-14474
>                 URL: https://issues.apache.org/jira/browse/HBASE-14474
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.0.3, 1.1.3
>
>         Attachments: hbase-14474_v1.patch, hbase-14474_v2.patch, 
> hbase-14474_v3.patch
>
>
> From a code base that contains 1.1.2 + HBASE-14449 + HBASE-14241 and 
> HBASE-14313 we can reproduce a dead lock with serverKilling CM easily: 
> {code}
> Found one Java-level deadlock:
> =============================
> "htable-pool1-t63":
>   waiting to lock monitor 0x0000000001cb1688 (object 0x00000000806ef150, a 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection),
>   which is held by "IPC Client (1403704789) connection to 
> enis-hbase-sep-21-6.novalocal/172.22.107.106:16020 from root"
> "IPC Client (1403704789) connection to 
> enis-hbase-sep-21-6.novalocal/172.22.107.106:16020 from root":
>   waiting to lock monitor 0x0000000001cb1738 (object 0x00000000806f0c60, a 
> java.lang.Object),
>   which is held by "htable-pool1-t63"
> Java stack information for the threads listed above:
> ===================================================
> "htable-pool1-t63":
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:819)
>       - waiting to lock <0x00000000806ef150> (a 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906)
>       - locked <0x00000000806f0c60> (a java.lang.Object)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1192)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287)
>       at 
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:32699)
>       at 
> org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:129)
>       at 
> org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:54)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
>       at 
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:708)
>       at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> "IPC Client (1403704789) connection to 
> enis-hbase-sep-21-6.novalocal/172.22.107.106:16020 from root":
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.close(RpcClientImpl.java:832)
>       - waiting to lock <0x00000000806f0c60> (a java.lang.Object)
>       - locked <0x00000000806ef150> (a 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection)
>       at 
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.run(RpcClientImpl.java:574)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to