[ 
https://issues.apache.org/jira/browse/HBASE-16165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359877#comment-15359877
 ] 

Duo Zhang commented on HBASE-16165:
-----------------------------------

We hit this recently, but only happens on our legacy 94 clusters. And we found 
that there is another bug in 0.94.

In 0.94, when we can not write back the whole response at the first place, we 
will attach the call to the channel's SelectionKey, and never detach it. So if 
we have lots of connections whose selection key is attached with a call, and 
the call's param field is large(this usually happens when replication is 
enabled) then we will run into OOM.

So for hbase 0.98+, I think this is only theoretical. It could only happen if a 
client keeps sending large put request but never receives the response. Let's 
modify the priority. :)

> Decrease RpcServer.callQueueSize before writeResponse causes OOM
> ----------------------------------------------------------------
>
>                 Key: HBASE-16165
>                 URL: https://issues.apache.org/jira/browse/HBASE-16165
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>
> In RpcServer, we use {{callQueueSizeInBytes}} to avoid queuing too many calls 
> which causes OOM. But in {{CallRunner.run}}, we decrease it before send the 
> response back. And even after calling {{sendResponseIfReady}}, the call 
> object could stay in our heap for a long time if we can not write out the 
> response(That's why we need a Responder thread...). This makes it possible 
> that the actual size of all call object in heap is larger than 
> {{maxQueueSizeInBytes}} and causes OOM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to