[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14148147#comment-14148147 ]
Jing Zhao commented on HDFS-3107: --------------------------------- Thanks for working on this, Plamen! I have not done a detailed review yet, so far some quick comments on the design doc and patch: # The following code is not the correct way to check if a file is contained in snapshots. {{isWithSnapshot}} only checks if an INodeFile has a snapshot feature, and a snapshot feature is added to this file only when the file is modified. Thus a file can be contained in snapshots but without snapshot feature on itself. So here we should use {{INode#isInLatestSnapshot}} for the checking. {code} + // Data will be lost after truncate occurs so it cannot support snapshots. + if(inodeFile.isWithSnapshot()) + throw new HadoopIllegalArgumentException("Cannot truncate file with " + + "snapshots enabled."); {code} # Maybe I'm missing something here, but do we also need to take the new BEING_TRUNCATED state into account when processing block reports (I guess the logic should be similar with UNDER_CONSTRUCTION)? # Maybe we should also consider adding a configuration key to disable the functionality just like what we did for append in the past. # In the design doc, for handling snapshots, either approach 2 or 3 looks good to me. However, to disallow truncate only when snapshot is taken does not look like a good semantic to me. In practice, this will cause a lot of trouble for both admin and applications using truncate. I suggest we finish this work before committing this work to trunk or branch-2. > HDFS truncate > ------------- > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Reporter: Lei Chang > Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, > HDFS-3107.patch, HDFS_truncate.pdf, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)