Andrew Purtell created HBASE-10121:
--------------------------------------
Summary: Abort wedged Calls after a timeout
Key: HBASE-10121
URL: https://issues.apache.org/jira/browse/HBASE-10121
Project: HBase
Issue Type: Bug
Affects Versions: 0.94.11
Reporter: Andrew Purtell
Attachments: screenshot.jpg
Saw this on a mail to user@.
"REPL IPC Server handler $N on $PORT WAITING Waiting for a call (since 22 hrs,
57mins, 38sec ago)"
I don't think this is a TCP level issue. We are enabling keepalives on
connections by default. Either we failed to remove the call upon exception or
the remote is alive but not sending.
Looking at the IPC server code, I don't see where we abort and clean up wedged
Calls after some timeout. Regardless of the other issues here, should we do
that?
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)