[ 
https://issues.apache.org/jira/browse/HADOOP-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17686220#comment-17686220
 ] 

ASF GitHub Bot commented on HADOOP-18596:
-----------------------------------------

mehakmeet commented on PR #5308:
URL: https://github.com/apache/hadoop/pull/5308#issuecomment-1423704609

   Have made the changes @steveloughran suggested including changing ">" to 
">=". 
   
   Feel like we can have both strictly greater or greater equals for the check, 
the latter we would be taking a slight risk that the source file may have 
changed at the same time the last sync took place and we would be skipping the 
copy in that case, and the former in which we can have an additional copy even 
if there's no content changed but the mod time is same for both source and 
target. Shouldn't we prioritize accuracy here?
   Any more thoughts on if we should change this or keep ">="?




> Distcp -update between different cloud stores to use modification time while 
> checking for file skip.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18596
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18596
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Mehakmeet Singh
>            Assignee: Mehakmeet Singh
>            Priority: Major
>              Labels: pull-request-available
>
> Distcp -update currently relies on File size, block size, and Checksum 
> comparisons to figure out which files should be skipped or copied. 
> Since different cloud stores have different checksum algorithms we should 
> check for modification time as well to the checks.
> This would ensure that while performing -update if the files are perceived to 
> be out of sync we should copy them. The machines between which the file 
> transfers occur should be in time sync to avoid any extra copies.
> Improving testing and documentation for modification time checks between 
> different object stores to ensure no incorrect skipping of files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to