[ 
https://issues.apache.org/jira/browse/HADOOP-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17685980#comment-17685980
 ] 

ASF GitHub Bot commented on HADOOP-18596:
-----------------------------------------

mehakmeet commented on code in PR #5308:
URL: https://github.com/apache/hadoop/pull/5308#discussion_r1100412015


##########
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/util/DistCpUtils.java:
##########
@@ -613,8 +623,12 @@ public static void compareFileLengthsAndChecksums(long 
srcLen,
 
     //At this point, src & dest lengths are same. if length==0, we skip 
checksum
     if ((srcLen != 0) && (!skipCrc)) {
-      if (!checksumsAreEqual(sourceFS, source, sourceChecksum,
-          targetFS, target, srcLen)) {
+      CopyMapper.ChecksumComparison
+          checksumComparison = checksumsAreEqual(sourceFS, source, 
sourceChecksum,
+              targetFS, target, srcLen);
+      // If Checksum comparison is false set it to false, else set to true.
+      boolean checksumResult = 
!checksumComparison.equals(CopyMapper.ChecksumComparison.FALSE);

Review Comment:
   We'll be setting "checksumResult" to be true for both "INCOMPATIBLE" and 
"TRUE" result from checksumsAreEqual() method else false and go through L632, 
so, we would be following the same flow as before since incompatible result 
from this method was true earlier too.





> Distcp -update between different cloud stores to use modification time while 
> checking for file skip.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18596
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18596
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Mehakmeet Singh
>            Assignee: Mehakmeet Singh
>            Priority: Major
>              Labels: pull-request-available
>
> Distcp -update currently relies on File size, block size, and Checksum 
> comparisons to figure out which files should be skipped or copied. 
> Since different cloud stores have different checksum algorithms we should 
> check for modification time as well to the checks.
> This would ensure that while performing -update if the files are perceived to 
> be out of sync we should copy them. The machines between which the file 
> transfers occur should be in time sync to avoid any extra copies.
> Improving testing and documentation for modification time checks between 
> different object stores to ensure no incorrect skipping of files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to