[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135529#comment-14135529 ]
Ryan Maus commented on HDFS-3107: --------------------------------- _> What are the use-cases. Right now we have discussed only one use-case: allowing users to remove data they accidentially appended._ This is not a particularly good use case, certainly not enough to justify the change we are discussing here. You can't protect users from every little potential accident. However, an extremely common use-case is *failed* appends, where the writing application dies / is killed / etc, which results in writing corrupted or incomplete data to HDFS. The big problem here is that most HDFS guides suggest storing large amounts of data in each file (e.g. in a typical log-file setup, one simply appends each day's new data onto a much larger existing file). By not having a truncate command, this is essentially not supported. Sooner or later you are going to have some kind of error during an append, and it will require you to copy-truncate a massive file to fix it. Append should never have been implemented without truncate. > HDFS truncate > ------------- > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Reporter: Lei Chang > Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)