[ 
https://issues.apache.org/jira/browse/HDFS-13660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16506347#comment-16506347
 ] 

Steve Loughran commented on HDFS-13660:
---------------------------------------

interesting. But at least it failed...a bigger risk would be if the file was 
changed to a new file of the same size...if the read crossed a block boundary, 
you could end up with a mix of the old and new data. You'd be hard pressed to 
safely identify the problem, other than by comparing the source checksum before 
the upload began with the source checksum after it had finished

# I think the first step here would be to document what you must not do while 
an upload is in progress: append/replace files
# longer term: if, after an upload, identify when the source has changed, warn 
and maybe repeat the upload. That'd be with a checksum on HDFS; modified 
timestamp elsewhere


> DistCp job fails when new data is appended in the file while the distCp copy 
> job is running
> -------------------------------------------------------------------------------------------
>
>                 Key: HDFS-13660
>                 URL: https://issues.apache.org/jira/browse/HDFS-13660
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: distcp
>            Reporter: Mukund Thakur
>            Assignee: Mukund Thakur
>            Priority: Critical
>         Attachments: distcp_failure_when_file_append.log
>
>
> Steps to reproduce: 
> Suppose distcp MR job is copying the file /tmp/web_returns_merged/data-m-002 
> and 
> we append some more data to this file using command 
> hadoop fs -appendToFile xaa  /tmp/web_returns_merged/data-m-002
> the job fails with exception 
>  Mismatch in length of 
> source:hdfs://mycluster0/tmp/web_returns_merged/data-m-002 and target.
> Attached the logs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to