[jira] [Commented] (HDFS-3107) HDFS truncate

Jing Zhao (JIRA) Thu, 25 Sep 2014 11:59:50 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148147#comment-14148147
 ]


Jing Zhao commented on HDFS-3107:
---------------------------------

Thanks for working on this, Plamen! I have not done a detailed review yet, so 
far some quick comments on the design doc and patch:
# The following code is not the correct way to check if a file is contained in 
snapshots. {{isWithSnapshot}} only checks if an INodeFile has a snapshot 
feature, and a snapshot feature is added to this file only when the file is 
modified. Thus a file can be contained in snapshots but without snapshot 
feature on itself. So here we should use {{INode#isInLatestSnapshot}} for the 
checking.
{code}
+    // Data will be lost after truncate occurs so it cannot support snapshots.
+    if(inodeFile.isWithSnapshot())
+      throw new HadoopIllegalArgumentException("Cannot truncate file with " +
+          "snapshots enabled.");
{code}
# Maybe I'm missing something here, but do we also need to take the new 
BEING_TRUNCATED state into account when processing block reports (I guess the 
logic should be similar with UNDER_CONSTRUCTION)?
# Maybe we should also consider adding a configuration key to disable the 
functionality just like what we did for append in the past.
# In the design doc, for handling snapshots, either approach 2 or 3 looks good 
to me. However, to disallow truncate only when snapshot is taken does not look 
like a good semantic to me. In practice, this will cause a lot of trouble for 
both admin and applications using truncate. I suggest we finish this work 
before committing this work to trunk or branch-2.

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

Reply via email to