[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

Victor Xu (JIRA) Mon, 13 Jul 2015 23:19:53 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625902#comment-14625902
 ]


Victor Xu commented on HBASE-14062:
-----------------------------------

We can see from the rs log that META table located on that rs. I guess maybe 
some applications use very short client rpc timeout or have requests cached 
locally before actually sending to this rs, and when the requests reach the rs, 
they almost exceed the timeout immediately. When the clients retry, this 
request-and-fail loop continues. This could happen when some big job (tens of 
thousands of maps using TableInputFormat) starts.

> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> ------------------------------------------------------------
>
>                 Key: HBASE-14062
>                 URL: https://issues.apache.org/jira/browse/HBASE-14062
>             Project: HBase
>          Issue Type: Bug
>          Components: IPC/RPC
>    Affects Versions: 0.98.12
>            Reporter: Victor Xu
>         Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x00007f158097b800 
> nid=0x2cd05 waiting for monitor entry [0x0000000046374000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
>         - waiting to lock <0x00000002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x00007f1580394000 
> nid=0x2cc19 runnable [0x0000000043b4c000]
>    java.lang.Thread.State: RUNNABLE
>         at java.util.LinkedList.remove(LinkedList.java:363)
>         at 
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
>         - locked <0x00000002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
>         - locked <0x00000002bb094ac8> (a 
> java.util.Collections$SynchronizedList)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
>         - locked <0x00000002bae09a30> (a 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>         at 
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it 
> happens. It seems like a bug. Any suggestions?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-14062) RpcServer.Listener.doAccept get blocked by LinkedList.remove

Reply via email to