[
https://issues.apache.org/jira/browse/HBASE-14926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-14926:
--------------------------
Attachment: 14926v2.txt
Ok. Played more trying to capture thriftserver in described state. I saw the
issue in thread dump but proved elusive (wasn't sure if it my timeout that
fixed it or not). The change is pretty basic so should be safe enough...
setting a timeout on server socket... which we want anyways. Going to commit
on back of @apurtell +1. v2 adds logging of the timeout to thrift2 version too
and adds a sentence or two more to examples on how to get going with thrift
server.
> Hung ThriftServer; no timeout on read from client; if client crashes, worker
> thread gets stuck reading
> ------------------------------------------------------------------------------------------------------
>
> Key: HBASE-14926
> URL: https://issues.apache.org/jira/browse/HBASE-14926
> Project: HBase
> Issue Type: Bug
> Components: Thrift
> Affects Versions: 2.0.0, 1.2.0, 1.1.2, 1.3.0, 1.0.3, 0.98.16
> Reporter: stack
> Assignee: stack
> Attachments: 14926.patch, 14926v2.txt
>
>
> Thrift server is hung. All worker threads are doing this:
> {code}
> "thrift-worker-0" daemon prio=10 tid=0x00007f0bb95c2800 nid=0xf6a7 runnable
> [0x00007f0b956e0000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
> - locked <0x000000066d859490> (a java.io.BufferedInputStream)
> at
> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.transport.TFramedTransport.readFrame(TFramedTransport.java:129)
> at
> org.apache.thrift.transport.TFramedTransport.read(TFramedTransport.java:101)
> at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
> at
> org.apache.thrift.protocol.TCompactProtocol.readByte(TCompactProtocol.java:601)
> at
> org.apache.thrift.protocol.TCompactProtocol.readMessageBegin(TCompactProtocol.java:470)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:27)
> at
> org.apache.hadoop.hbase.thrift.TBoundedThreadPoolServer$ClientConnnection.run(TBoundedThreadPoolServer.java:289)
> at
> org.apache.hadoop.hbase.thrift.CallQueue$Call.run(CallQueue.java:64)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> They never recover.
> I don't have client side logs.
> We've been here before: HBASE-4967 "connected client thrift sockets should
> have a server side read timeout" but this patch only got applied to fb branch
> (and thrift has changed since then).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)