[
https://issues.apache.org/jira/browse/HBASE-19236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vikas Vishwakarma reassigned HBASE-19236:
-----------------------------------------
Assignee: Harshal Jain
> Tune client backoff trigger logic and backoff time in
> ExponentialClientBackoffPolicy
> ------------------------------------------------------------------------------------
>
> Key: HBASE-19236
> URL: https://issues.apache.org/jira/browse/HBASE-19236
> Project: HBase
> Issue Type: Improvement
> Reporter: Vikas Vishwakarma
> Assignee: Harshal Jain
>
> We were evaluating the ExponentialClientBackoffPolicy (HBASE-12986) for
> implementing basic service protection, usage quota allocation for few heavy
> loading clients, especially M/R job based HBase clients. However it was
> observed that ExponentialClientBackoffPolicy slows down the client
> dramatically even when there is not much load on the HBase cluster.
> For a simple multithreaded write throughput client without enabling
> ExponentialClientBackoffPolicy was able to complete in less than 5 mins
> running on a 40 node cluster (~100G data).
> The same client took ~10 hours to complete with
> ExponentialClientBackoffPolicy enabled with default settings
> DEFAULT_MAX_BACKOFF of 5 mins
> Even after reducing the DEFAULT_MAX_BACKOFF of 1 min, the client took ~2
> hours to complete
> Current ExponentialClientBackoffPolicy decides the backoff time based on 3
> factors
> // Factor in memstore load
> double percent = regionStats.getMemstoreLoadPercent() / 100.0;
> // Factor in heap occupancy
> float heapOccupancy = regionStats.getHeapOccupancyPercent() / 100.0f;
> // Factor in compaction pressure, 1.0 means heavy compaction pressure
> float compactionPressure = regionStats.getCompactionPressure() / 100.0f;
> However according to our test observations it looks like the client backoff
> is getting triggered even when there is hardly any load on the cluster. We
> need to evaluate the existing logic or probably implement a different policy
> more customized and suitable to our needs.
> One of the ideas is to base it directly on compactionQueueLength instead of
> heap occupancy etc. Consider a case where there is high throughput write load
> and the compaction is still able keep up with the rate of memstore flushes
> and compact all the files being flushed at the same rate. In this case
> memstore can be full and heap occupancy can be high but still it is not
> necessary indicator that the service is falling behind on processing the
> client load and there is a need to backoff the client as we are just
> utilizing the full write throughput of the system which is good. However if
> the compactionQueue starts building up and is continuously above a threshold
> and increasing then that is a reliable indicator that the system is not able
> to keep up with the input load and is slowly falling behind.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)