[ 
https://issues.apache.org/jira/browse/HBASE-17408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796238#comment-15796238
 ] 

Enis Soztutar commented on HBASE-17408:
---------------------------------------

Ted, giving more context definitely helps others to follow along. 

I have seen applications sending very big "batches" in two different settings. 
In the first instance, the application was sending multi-get requests, but each 
RPC would end up accumulating about 5K get requests (because application would 
call HTable.get(List) with default settings with a list of that size). The 
server would have to buffer up all the responses causing OOM on the server 
side. 

On the second instance (which is the same reason Ted logged this issue), the 
application was doing multi requests with Increment. In this case, it was doing 
15K increments in a single RPC, which would take longer than RPC timeout. The 
following RPCs would then get ~15K exceptions due to nonce collisions (because 
previous RPC indeed succeeds regardless of the timeout on the client side). 
This RPC response is multi-GB in size causing OOMs. 

I have not checked the recent status for the AP, especially with HBASE-16224, 
but even with that, we should have both the heap size limit and number of 
actions-per-batch limit. With default settings, doing HTable.get(List) with a 
10K gets in that list should not cause OOM on the server side (same with 
HTable.batch()). If we already have limits for these (sorry I did not check 
yet), then we should adjust down the default values at least. 

> Introduce per request limit by number of mutations
> --------------------------------------------------
>
>                 Key: HBASE-17408
>                 URL: https://issues.apache.org/jira/browse/HBASE-17408
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Ted Yu
>
> HBASE-16224 introduced hbase.client.max.perrequest.heapsize to limit the 
> amount of data sent from client.
> We should consider adding per request limit through the number of mutations 
> in a batch.
> In recent troubleshooting sessions, customer had to do this in their 
> application code to avoid OOME on the server side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to