[jira] [Commented] (HADOOP-17611) Distcp parallel file copy breaks the modification time

Adam Maroti (Jira) Mon, 12 Apr 2021 10:29:05 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-17611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319615#comment-17319615
 ]


Adam Maroti commented on HADOOP-17611:
--------------------------------------

When times is set the preserve() function is called from the copy mapper
after the file/file junk creation. The copycomitter which runs after that
and does the concat doesn't call preserve because it no longer has the
source file statuses. So the concat happens inside of copycomitter which is
run after the copy mapper causing the concat to be run after the preserve.

Viraj Jasani (Jira) <j...@apache.org> ezt írta (időpont: 2021. ápr. 12., H



> Distcp parallel file copy breaks the modification time
> ------------------------------------------------------
>
>                 Key: HADOOP-17611
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17611
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Adam Maroti
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> The commit HADOOP-11794. Enable distcp to copy blocks in parallel. 
> (bf3fb585aaf2b179836e139c041fc87920a3c886) broke the modification time of 
> large files.
>  
> In CopyCommitter.java inside concatFileChunks Filesystem.concat is called 
> which changes the modification time therefore the modification times of files 
> copeid by distcp will not match the source files. However this only occurs 
> for large enough files, which are copied by splitting them up by distcp.
> In concatFileChunks before calling concat extract the modification time and 
> apply that to the concatenated result-file after the concat. (probably best 
> -after- before the rename()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-17611) Distcp parallel file copy breaks the modification time

Reply via email to