dfs -copyToLocal should guarantee file is complete
--------------------------------------------------

                 Key: HADOOP-1292
                 URL: https://issues.apache.org/jira/browse/HADOOP-1292
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
            Reporter: eric baldeschwieler


We should copy to a temporary file, maybe _tmp.<realname>, and then rename the 
file when the copy is complete.  Restarting a copy should reuse the _tmp file, 
just checksumming it.  Then ^Cing a copy will do the right thing.

Original suggestion:

On Apr 23, 2007, at 2:38 AM, Richard Kasperski wrote:

I'd like to have a guarantee that a file copy is both completed and that the 
file is whole. In the past I've done this  by copying the file to a temporary 
name tmp.<realname> and then moving it to <realname> once I have the file copy 
is complete. This has the following very nice properties; If the <realname> 
exists then the file copy is complete and I'm not looking at a partial copy of 
the file. I believe that the copy to the cluster has both of these properties 
in that the file doesn't appear in a DFS directory until the whole file has 
been copied. The copy from the cluster to a local file system does not have 
these guarantees and it would be very nice if it did. There are two scenarios 
under what I wish to use this. First is that if I ctrl-c the 'hadoop dfs 
-copyToLocal' I know what parts are complete and what parts aren't. Second I 
can run a background compressor to compress the files as they are copied.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to