[ 
https://issues.apache.org/jira/browse/HADOOP-17611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319615#comment-17319615
 ] 

Adam Maroti edited comment on HADOOP-17611 at 4/13/21, 8:28 AM:
----------------------------------------------------------------

When times is set the preserve() function is called from the copy mapper
after the file/file chunk creation. The copycomitter which runs after that
and does the concat doesn't call preserve because it no longer has the
source file statuses. So the concat happens inside of copycomitter which is
run after the copy mapper causing the concat to be run after the preserve.




was (Author: amaroti):
When times is set the preserve() function is called from the copy mapper
after the file/file junk creation. The copycomitter which runs after that
and does the concat doesn't call preserve because it no longer has the
source file statuses. So the concat happens inside of copycomitter which is
run after the copy mapper causing the concat to be run after the preserve.

Viraj Jasani (Jira) <[email protected]> ezt írta (időpont: 2021. ápr. 12., H



> Distcp parallel file copy breaks the modification time
> ------------------------------------------------------
>
>                 Key: HADOOP-17611
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17611
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Adam Maroti
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The commit HADOOP-11794. Enable distcp to copy blocks in parallel. 
> (bf3fb585aaf2b179836e139c041fc87920a3c886) broke the modification time of 
> large files.
>  
> In CopyCommitter.java inside concatFileChunks Filesystem.concat is called 
> which changes the modification time therefore the modification times of files 
> copeid by distcp will not match the source files. However this only occurs 
> for large enough files, which are copied by splitting them up by distcp.
> In concatFileChunks before calling concat extract the modification time and 
> apply that to the concatenated result-file after the concat. (probably best 
> -after- before the rename()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to