anoopsjohn commented on a change in pull request #1764:
URL: https://github.com/apache/hbase/pull/1764#discussion_r429521262
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/tool/BulkLoadHFilesTool.java
##########
@@ -879,13 +881,21 @@ public void bulkHFile(ColumnFamilyDescriptorBuilder
builder, FileStatus hfileSta
}
int maxRetries =
getConf().getInt(HConstants.BULKLOAD_MAX_RETRIES_NUMBER, 10);
- maxRetries = Math.max(maxRetries, startEndKeys.size() + 1);
+
+ /**
+ * For the first attempt, we make maxRetries with the configured maximum
retry number
+ * As long as we find that region number changed, we setup maxRetries to
region number
+ * But if we find that the region is not changed, then the maxRetries
should be still
+ * be configured BULKLOAD_MAX_RETRIES_NUMBER to avoid meaningless retry
attempts
+ */
+ if(count != 0 && previousRegionNum != startEndKeys.size() )
Review comment:
This is a nice find.
Well this will help your particular case. But don't think this is a generic
solution.
What if a cluster having an issue like your's (config issue) and during this
configured retry times there was a split also! It will change the max retries
to be so large (if so many regions like ur case).
What if we just reset the 'count' only within the while loop if we notice a
split happened in between the current run and previous? If there are splits
happening at regular interval this will cause a never ending loop and would
still need an upper bound. But blindly making the retry count to be #regions+1
just after seeing a split is something concerning to me.
cc @saintstack - Seems you reviewed the original jira so might be
remembering that context. Any pointers sir?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]