[
https://issues.apache.org/jira/browse/HBASE-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014065#comment-14014065
]
Andrew Purtell commented on HBASE-11277:
----------------------------------------
I'd say the runaway loop is RpcServer.Connection#readAndProcess(). It is
constructed with a top level while (true) loop so if ever we miss all of the
coded exit conditions we will iterate forever.
Like 1488 corresponds to a call to channelRead(). This is after "We have read a
length and we have read the preamble. It is either the connection header or it
is a request." Given the observed behavior, I think we are going to the
unconditional else clause where "More to read still; go around again.", and
going around, and around, and around.
> RPCServer threads can wedge under high load
> -------------------------------------------
>
> Key: HBASE-11277
> URL: https://issues.apache.org/jira/browse/HBASE-11277
> Project: HBase
> Issue Type: Bug
> Reporter: Andrew Purtell
>
> This is with 0.98.0 in an insecure setup with 7u55 and 7u60. Under high load,
> RPCServer threads can wedge, fail to make progess, and consume 100% CPU time
> on a core indefinitely.
> Dumping threads, all threads are in BLOCKED or IN_NATIVE state. The IN_NATIVE
> threads are mostly in EPollArrayWrapper.epollWait or
> FileDispatcherImpl.read0. The number of threads found in
> FileDispatcherImpl.read0 correspond to the number of runaway threads expected
> based on looking at 'top' output. These look like:
> {noformat}
> Thread 64758: (state = IN_NATIVE)
> - sun.nio.ch.FileDispatcherImpl.read0(java.io.FileDescriptor, long, int)
> @bci=0 (Compiled frame; information may be imprecise)
> - sun.nio.ch.SocketDispatcher.read(java.io.FileDescriptor, long, int)
> @bci=4, line=39 (Compiled frame)
> - sun.nio.ch.IOUtil.readIntoNativeBuffer(java.io.FileDescriptor,
> java.nio.ByteBuffer, long, sun.nio.ch.NativeDispatcher) @bci=114, line=223
> (Compil
> ed frame)
> - sun.nio.ch.IOUtil.read(java.io.FileDescriptor, java.nio.ByteBuffer, long,
> sun.nio.ch.NativeDispatcher) @bci=48, line=197 (Compiled frame)
> - sun.nio.ch.SocketChannelImpl.read(java.nio.ByteBuffer) @bci=234, line=379
> (Compiled frame)
> -
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(java.nio.channels.ReadableByteChannel,
> java.nio.ByteBuffer) @bci=12, line=2224 (Compiled frame)
> - org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess()
> @bci=509, line=1488 (Compiled frame)
> -
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(java.nio.channels.SelectionKey)
> @bci=23, line=790 (Compiled frame)
> - org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop() @bci=97,
> line=581 (Compiled frame)
> - org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run() @bci=1,
> line=556 (Interpreted frame)
> -
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
> @bci=95, line=1145 (Interpreted frame)
> - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615
> (Interpreted frame)
> - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)