Abhishek Singh Chouhan created HBASE-19215:
----------------------------------------------
Summary: Incorrect exception handling on the client causes
incorrect call timeouts and byte buffer allocations on the server
Key: HBASE-19215
URL: https://issues.apache.org/jira/browse/HBASE-19215
Project: HBase
Issue Type: Bug
Affects Versions: 1.3.1
Reporter: Abhishek Singh Chouhan
Assignee: Abhishek Singh Chouhan
Ran into the situation of oome on the client : java.lang.OutOfMemoryError:
Direct buffer memory.
When we encounter an unhandled exception during channel write at RpcClientImpl
{noformat}
checkIsOpen(); // Now we're checking that it didn't became idle in between.
try {
call.callStats.setRequestSizeBytes(IPCUtil.write(this.out, header,
call.param,
cellBlock));
} catch (IOException e) {
{noformat}
we end up leaving the connection open. This becomes especially problematic when
we get an unhandled exception between writing the length of our request on the
channel and subsequently writing the params and cellblocks
{noformat}
*dos.write(Bytes.toBytes(totalSize));*
// This allocates a buffer that is the size of the message internally.
header.writeDelimitedTo(dos);
if (param != null) param.writeDelimitedTo(dos);
if (cellBlock != null) dos.write(cellBlock.array(), 0,
cellBlock.remaining());
dos.flush();
return totalSize;
{noformat}
After reading the length rs allocates a bb and expects data to be filled.
However when we encounter an exception during param write we release the
writelock in rpcclientimpl and do not close the connection, the exception is
handled at AbstractRpcClient.callBlockingMethod and retried. Now the next
client request to the same rs writes to the channel however the server
interprets this as part of the previous request and errors out during proto
conversion when processing the request since its considered malformed(in the
worst case this might be misinterpreted as wrong data?). Now the remaining data
of the current request is read(the current request's size > prev request's
allocated partially filled bytebuffer) and is misinterpreted as the size of new
request, in my case this was in gbs. All the client requests time out since
this bytebuffer is never completely filled. We should close the connection for
any Throwable and not just ioexception.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)