[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14135529#comment-14135529
 ] 

Ryan Maus commented on HDFS-3107:
---------------------------------

_> What are the use-cases. Right now we have discussed only one use-case: 
allowing users to remove data they accidentially appended._

This is not a particularly good use case, certainly not enough to justify the 
change we are discussing here.  You can't protect users from every little 
potential accident.

However, an extremely common use-case is *failed* appends, where the writing 
application dies / is killed / etc, which results in writing corrupted or 
incomplete data to HDFS.  The big problem here is that most HDFS guides suggest 
storing large amounts of data in each file (e.g. in a typical log-file setup, 
one simply appends each day's new data onto a much larger existing file).  By 
not having a truncate command, this is essentially not supported.

Sooner or later you are going to have some kind of error during an append, and 
it will require you to copy-truncate a massive file to fix it.  Append should 
never have been implemented without truncate.

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to