wuchang created HBASE-24420: ------------------------------- Summary: BulkLoad May Fall Into Unbelievable Retry Attempt in Some case Key: HBASE-24420 URL: https://issues.apache.org/jira/browse/HBASE-24420 Project: HBase Issue Type: Bug Reporter: wuchang
In https://issues.apache.org/jira/browse/HBASE-14541, the retry logic changed from a configurable retry times(by configuration item hbase.bulkload.retries.number) to below retry logic to process the issue that the RegionSplit happened during bulk load: {code:java} int maxRetries = getConf().getInt("hbase.bulkload.retries.number", 10); maxRetries = Math.max(maxRetries, startEndKeys.getFirst().length + 1); if (maxRetries != 0 && count >= maxRetries) { throw new IOException("Retry attempted " + count + " times without completing, bailing out"); } {code} This issue caused another issue in our cluster, that is: Our table has 2000 regions and our bulk load failed for an configuration issue(unrelated with this case, so ignore the failure reason) and then ,the bulk load fall into a retry disaster and after retry reached about 200, our HDFS crashed for OOM. During with, the HBase table splits never happened; I think the patch in HBASE-14541 didn't handle the unrecoverable retry case and in this case(I think many reason may incur unrecoverable retry) the meaningless retry attempts becomes disaster and is un-configurable because we cannot change the Region number of our table; -- This message was sent by Atlassian Jira (v8.3.4#803005)