[
https://issues.apache.org/jira/browse/HBASE-24420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wuchang updated HBASE-24420:
----------------------------
Summary: BulkLoad Will Fall Into Unbelievable Retry Attempt in Some case
(was: BulkLoad May Fall Into Unbelievable Retry Attempt in Some case)
> BulkLoad Will Fall Into Unbelievable Retry Attempt in Some case
> ---------------------------------------------------------------
>
> Key: HBASE-24420
> URL: https://issues.apache.org/jira/browse/HBASE-24420
> Project: HBase
> Issue Type: Bug
> Reporter: wuchang
> Priority: Major
>
> In https://issues.apache.org/jira/browse/HBASE-14541, the retry logic changed
> from a configurable retry times(by configuration item
> hbase.bulkload.retries.number) to below retry logic to process the issue
> that the RegionSplit happened during bulk load:
> {code:java}
> int maxRetries = getConf().getInt("hbase.bulkload.retries.number", 10);
> maxRetries = Math.max(maxRetries, startEndKeys.getFirst().length + 1);
> if (maxRetries != 0 && count >= maxRetries) {
> throw new IOException("Retry attempted " + count +
> " times without completing, bailing out");
> }
> {code}
> This issue caused another issue in our cluster, that is:
> Our table has 2000 regions and our bulk load failed for an configuration
> issue(unrelated with this case, so ignore the failure reason) and then ,the
> bulk load fall into a retry disaster and after retry reached about 200, our
> HDFS crashed for OOM.
> During with, the HBase table splits never happened;
> I think the patch in HBASE-14541 didn't handle the unrecoverable retry case
> and in this case(I think many reason may incur unrecoverable retry) the
> meaningless retry attempts becomes disaster and is un-configurable because
> we cannot change the Region number of our table;
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)