Tsz Wo (Nicholas), SZE created HDFS-4529:
--------------------------------------------

             Summary: Decide the semantic of concat with snapshots
                 Key: HDFS-4529
                 URL: https://issues.apache.org/jira/browse/HDFS-4529
             Project: Hadoop HDFS
          Issue Type: Sub-task
          Components: namenode
            Reporter: Tsz Wo (Nicholas), SZE
            Assignee: Tsz Wo (Nicholas), SZE


The use case of concat is for copying large files across clusters using the 
following steps.

- Step 1: The blocks of a file in the source cluster are copied in parallel to 
transient files in the destination cluster.
- Step 2: Then the transient files in the destination cluster are concatenated 
in order to obtain the original file.

If a snapshot is taken in the destination cluster before Step 2, some transient 
files may be captured in the snapshot.  Then what should happen?  The following 
are some alternatives:

* (1) fail concat and keep the transient files in the snapshots;
* (2) allow concat and keep the transient files in the snapshots;
* (3) allow concat but remove the transient files from all snapshots.

All solutions above are not perfect.  Here are their drawbacks:

For (1) and (2), the transient files will remain in the system until the 
snapshots are deleted.  It is inefficient to the system since the files are 
known to be transient.  (1) may be able to force user to create files under 
some non-snapshottable tmp directory in the first place.  However, it 
complicates the user applications and the existing applications may need to be 
updated for the new policy.  Also, non-snapshottable directory may not exists 
since admin may set the system root directory to be snapshottable.  For (2), 
the problem seems to break the Read-Only snapshot contract - some files appear 
in a snapshot may disappear later on.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to