[jira] [Commented] (HDFS-3107) HDFS truncate

Lei Chang (Commented) (JIRA) Fri, 23 Mar 2012 21:36:08 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237430#comment-13237430
 ]


Lei Chang commented on HDFS-3107:
---------------------------------

> The proposed way to go about #3 by creating copies at the DN level and 
> truncating there seems messy, but if you think about it as a variant of #2 
> that leaks less information into the API (block boundaries, contents of last 
> segment), it seems simpler to me.

Agree with you, if only looking at the simplicity of the internal RPC APIs, #3 
is simpler. However, from the implementation part, in #3, clients need to work 
with both NN and DNs. There are many cases clients should take care when some 
nodes fails in the copy/truncate phase and some nodes succeed. For example:
1) The client should work with NN to handle the failures and do some recovery 
when DN fails. It is somewhat like the pipeline rebuild and recovery in the 
APPEND case.  
2) Client fail introduces some extra work too. (#1 also has to deal with this 
case, but simpler)

Thus, the implementation of #3 should be easier.

You mentioned a good point about the security part of the temporary file. It 
should be created with the same access privilege to the file being truncated.


                
> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node, name-node
>            Reporter: Lei Chang
>         Attachments: HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3107) HDFS truncate

Reply via email to