[
https://issues.apache.org/jira/browse/HDFS-15640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218141#comment-17218141
]
Jinglun commented on HDFS-15640:
--------------------------------
Hi [~linyiqun], thanks your nice comments ! During the final distcp there would
be a write-blocking and my original idea is to limit the write-blocking period
in an acceptable scope. So I made the assumption that if the last 3 consecutive
distcp run very fast, so will the next one. The user needs to think about an
acceptable time interval and set it when start fedbalance.
Using the diff entries number could achieve the same effect too so I'm also
also glad with this way. Only a little problem is it might not be easy to know
how much time will the diffs cost. But we can use a low number value to make
sure the write-blocking period won't be long.
[~linyiqun] , what do your think, shall we use the diff number or the time cost
as the threshold ? Or both ? I'm ok with all these choices:).
> RBF: Add fast distcp threshold to FedBalance.
> ---------------------------------------------
>
> Key: HDFS-15640
> URL: https://issues.apache.org/jira/browse/HDFS-15640
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jinglun
> Assignee: Jinglun
> Priority: Major
> Attachments: HDFS-15640.001.patch
>
>
> Currently in the DistCpProcedure it must submit distcp round by round until
> there is no diff to go to the final distcp stage. The condition is very
> strict. If the distcp could finish in an acceptable period then we don't need
> to wait for no diff. For example if 3 consecutive distcp jobs all finish
> within 10 minutes then we can predict the final distcp could also finish
> within 10 minutes. So we can start the final distcp directly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]