wuchang created HBASE-24420:
-------------------------------
Summary: BulkLoad May Fall Into Unbelievable Retry Attempt in Some
case
Key: HBASE-24420
URL: https://issues.apache.org/jira/browse/HBASE-24420
Project: HBase
Issue Type: Bug
Reporter: wuchang
In https://issues.apache.org/jira/browse/HBASE-14541, the retry logic changed
from a configurable retry times(by configuration item
hbase.bulkload.retries.number) to below retry logic to process the issue that
the RegionSplit happened during bulk load:
{code:java}
int maxRetries = getConf().getInt("hbase.bulkload.retries.number", 10);
maxRetries = Math.max(maxRetries, startEndKeys.getFirst().length + 1);
if (maxRetries != 0 && count >= maxRetries) {
throw new IOException("Retry attempted " + count +
" times without completing, bailing out");
}
{code}
This issue caused another issue in our cluster, that is:
Our table has 2000 regions and our bulk load failed for an configuration
issue(unrelated with this case, so ignore the failure reason) and then ,the
bulk load fall into a retry disaster and after retry reached about 200, our
HDFS crashed for OOM.
During with, the HBase table splits never happened;
I think the patch in HBASE-14541 didn't handle the unrecoverable retry case and
in this case(I think many reason may incur unrecoverable retry) the meaningless
retry attempts becomes disaster and is un-configurable because we cannot
change the Region number of our table;
--
This message was sent by Atlassian Jira
(v8.3.4#803005)