[ 
https://issues.apache.org/jira/browse/HBASE-19236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikas Vishwakarma reassigned HBASE-19236:
-----------------------------------------

    Assignee: Harshal Jain

> Tune client backoff trigger logic and backoff time in 
> ExponentialClientBackoffPolicy
> ------------------------------------------------------------------------------------
>
>                 Key: HBASE-19236
>                 URL: https://issues.apache.org/jira/browse/HBASE-19236
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Vikas Vishwakarma
>            Assignee: Harshal Jain
>
> We were evaluating the ExponentialClientBackoffPolicy (HBASE-12986) for 
> implementing basic service protection, usage quota allocation for few heavy 
> loading clients, especially M/R job based HBase clients. However it was 
> observed that ExponentialClientBackoffPolicy slows down the client 
> dramatically even when there is not much load on the HBase cluster. 
> For a simple multithreaded write throughput client without enabling 
> ExponentialClientBackoffPolicy was able to complete in less than 5 mins 
> running on a 40 node cluster (~100G data). 
> The same client took ~10 hours to complete with 
> ExponentialClientBackoffPolicy enabled with default settings 
> DEFAULT_MAX_BACKOFF of 5 mins
> Even after reducing the DEFAULT_MAX_BACKOFF of 1 min, the client took ~2 
> hours to complete
> Current ExponentialClientBackoffPolicy decides the backoff time based on 3 
> factors 
>     // Factor in memstore load
>     double percent = regionStats.getMemstoreLoadPercent() / 100.0;
>     // Factor in heap occupancy
>     float heapOccupancy = regionStats.getHeapOccupancyPercent() / 100.0f;
>     // Factor in compaction pressure, 1.0 means heavy compaction pressure
>     float compactionPressure = regionStats.getCompactionPressure() / 100.0f;
> However according to our test observations it looks like the client backoff 
> is getting triggered even when there is hardly any load on the cluster. We 
> need to evaluate the existing logic or probably implement a different policy 
> more customized and suitable to our needs. 
> One of the ideas is to base it directly on compactionQueueLength instead of 
> heap occupancy etc. Consider a case where there is high throughput write load 
> and the compaction is still able keep up with the rate of memstore flushes 
> and compact all the files being flushed at the same rate. In this case 
> memstore can be full and heap occupancy can be high but still it is not 
> necessary indicator that the service is falling behind on processing the 
> client load and there is a need to backoff the client as we are just 
> utilizing the full write throughput of the system which is good. However if 
> the compactionQueue starts building up and is continuously above a threshold 
> and increasing then that is a reliable indicator that the system is not able 
> to keep up with the input load and  is slowly falling behind. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to