[ 
https://issues.apache.org/jira/browse/HDFS-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220003#comment-17220003
 ] 

Yiqun Lin commented on HDFS-15640:
----------------------------------

[~LiJinglun], the latest patch almost looks good to me. 
 Minor comments from me:

*DistCpProcedure.java*
For below logic:
{code:java}
+  boolean diffDistCpStageDone() throws IOException, RetryException {
+    int diffSize = getDiffSize();
+    if (diffSize <= diffThreshold && (forceCloseOpenFiles
+        || !verifyOpenFiles())) {
+      return true;
+    }
+    if (diffSize == 0) {
+      throw new RetryException();
+    } else {
+      return false;
+    }
+  }
{code}
When diffSize is not 0 but it smaller than diffThreshold and 
(forceCloseOpenFiles || !verifyOpenFiles()) return false, we should also return 
RetryException.
 So above logic would be like below, below logic is consistent with original 
logic.
{code:java}
     boolean diffDistCpStageDone() throws IOException, RetryException {
      int diffSize = getDiffSize();
      if (diffSize <= diffThreshold) {
        if (forceCloseOpenFiles || !verifyOpenFiles()) {
          return true;
        } else {
          throw new RetryException();
        }
      }

      return false;
    }
{code}

*FedBalanceOptions.java*
Please update the description of DIFF_THRESHOLD option, I make a minor rewrite 
to let it easily understand.
{code:java}
final static Option DIFF_THRESHOLD = new Option("diffThreshold", true,
    "This specifies the threshold of the diff entries that used in incremental 
copy stage. If the diff entries"
        + " size is no greater than this threshold and the open files check is 
satisfied(no open files or force"
        + " close all open files), the fedBalance will go to the final round"
        + " of distcp. Default value is 0, that means waiting until there is no 
diff.");
{code}
 

 
*HDFSFederationBalance.md*
Can we update 'Specify the threshold of the diff entries.' to 'Specify the 
threshold of the diff entries that used in incremental copy stage.'?

*TestDistCpProcedure.java*
 # Please add a cleanup operation in testDiffThreshold like other test methods 
does in this class.
 # We can add new method buildContext(Path src, Path dst, String mount, int 
diffThreshold) without change existed method. Change existed one will have to 
do some unnecessary update change.

> RBF: Add fast distcp threshold to FedBalance.
> ---------------------------------------------
>
>                 Key: HDFS-15640
>                 URL: https://issues.apache.org/jira/browse/HDFS-15640
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jinglun
>            Assignee: Jinglun
>            Priority: Major
>         Attachments: HDFS-15640.001.patch, HDFS-15640.002.patch
>
>
> Currently in the DistCpProcedure it must submit distcp round by round until 
> there is no diff to go to the final distcp stage. The condition is very 
> strict. If the distcp could finish in an acceptable period then we don't need 
> to wait for no diff. For example if 3 consecutive distcp jobs all finish 
> within 10 minutes then we can predict the final distcp could also finish 
> within 10 minutes. So we can start the final distcp directly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to