[
https://issues.apache.org/jira/browse/HDFS-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220003#comment-17220003
]
Yiqun Lin commented on HDFS-15640:
----------------------------------
[~LiJinglun], the latest patch almost looks good to me.
Minor comments from me:
*DistCpProcedure.java*
For below logic:
{code:java}
+ boolean diffDistCpStageDone() throws IOException, RetryException {
+ int diffSize = getDiffSize();
+ if (diffSize <= diffThreshold && (forceCloseOpenFiles
+ || !verifyOpenFiles())) {
+ return true;
+ }
+ if (diffSize == 0) {
+ throw new RetryException();
+ } else {
+ return false;
+ }
+ }
{code}
When diffSize is not 0 but it smaller than diffThreshold and
(forceCloseOpenFiles || !verifyOpenFiles()) return false, we should also return
RetryException.
So above logic would be like below, below logic is consistent with original
logic.
{code:java}
boolean diffDistCpStageDone() throws IOException, RetryException {
int diffSize = getDiffSize();
if (diffSize <= diffThreshold) {
if (forceCloseOpenFiles || !verifyOpenFiles()) {
return true;
} else {
throw new RetryException();
}
}
return false;
}
{code}
*FedBalanceOptions.java*
Please update the description of DIFF_THRESHOLD option, I make a minor rewrite
to let it easily understand.
{code:java}
final static Option DIFF_THRESHOLD = new Option("diffThreshold", true,
"This specifies the threshold of the diff entries that used in incremental
copy stage. If the diff entries"
+ " size is no greater than this threshold and the open files check is
satisfied(no open files or force"
+ " close all open files), the fedBalance will go to the final round"
+ " of distcp. Default value is 0, that means waiting until there is no
diff.");
{code}
*HDFSFederationBalance.md*
Can we update 'Specify the threshold of the diff entries.' to 'Specify the
threshold of the diff entries that used in incremental copy stage.'?
*TestDistCpProcedure.java*
# Please add a cleanup operation in testDiffThreshold like other test methods
does in this class.
# We can add new method buildContext(Path src, Path dst, String mount, int
diffThreshold) without change existed method. Change existed one will have to
do some unnecessary update change.
> RBF: Add fast distcp threshold to FedBalance.
> ---------------------------------------------
>
> Key: HDFS-15640
> URL: https://issues.apache.org/jira/browse/HDFS-15640
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: HDFS-15640.001.patch, HDFS-15640.002.patch
>
>
> Currently in the DistCpProcedure it must submit distcp round by round until
> there is no diff to go to the final distcp stage. The condition is very
> strict. If the distcp could finish in an acceptable period then we don't need
> to wait for no diff. For example if 3 consecutive distcp jobs all finish
> within 10 minutes then we can predict the final distcp could also finish
> within 10 minutes. So we can start the final distcp directly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]