[
https://issues.apache.org/jira/browse/HBASE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625902#comment-14625902
]
Victor Xu commented on HBASE-14062:
-----------------------------------
We can see from the rs log that META table located on that rs. I guess maybe
some applications use very short client rpc timeout or have requests cached
locally before actually sending to this rs, and when the requests reach the rs,
they almost exceed the timeout immediately. When the clients retry, this
request-and-fail loop continues. This could happen when some big job (tens of
thousands of maps using TableInputFormat) starts.
> RpcServer.Listener.doAccept get blocked by LinkedList.remove
> ------------------------------------------------------------
>
> Key: HBASE-14062
> URL: https://issues.apache.org/jira/browse/HBASE-14062
> Project: HBase
> Issue Type: Bug
> Components: IPC/RPC
> Affects Versions: 0.98.12
> Reporter: Victor Xu
> Attachments: hbase.log, jstack.log
>
>
> We saw these blocked info in our jstack output:
> {noformat}
> "RpcServer.listener,port=60020" daemon prio=10 tid=0x00007f158097b800
> nid=0x2cd05 waiting for monitor entry [0x0000000046374000]
> java.lang.Thread.State: BLOCKED (on object monitor)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doAccept(RpcServer.java:833)
> - waiting to lock <0x00000002bb094ac8> (a
> java.util.Collections$SynchronizedList)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.run(RpcServer.java:748)
> {noformat}
> And the owner of the lock is LinkedList.remove:
> {noformat}
> "RpcServer.reader=9,port=60020" daemon prio=10 tid=0x00007f1580394000
> nid=0x2cc19 runnable [0x0000000043b4c000]
> java.lang.Thread.State: RUNNABLE
> at java.util.LinkedList.remove(LinkedList.java:363)
> at
> java.util.Collections$SynchronizedCollection.remove(Collections.java:1639)
> - locked <0x00000002bb094ac8> (a
> java.util.Collections$SynchronizedList)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.closeConnection(RpcServer.java:1992)
> - locked <0x00000002bb094ac8> (a
> java.util.Collections$SynchronizedList)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:867)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:645)
> - locked <0x00000002bae09a30> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:620)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> {noformat}
> This issue blocked RS once in a while and I had to restart it whenever it
> happens. It seems like a bug. Any suggestions?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)