[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137894#comment-14137894 ]
Ryan Maus commented on HDFS-3107: --------------------------------- bq. Most HDFS users have logic to handle partial records, due to this problem. For example, Flume can roll to a new file when the old one has an error. It's pretty simple to prefix your records with a length, and simply ignore partial records that result from an incomplete flush to a file. Using a record-length prefix is not a good fix to get around this. What happens if you fail when writing your record length? bq. I don't see what any of this has to do with append, since this issue could equally well come up without HDFS append. Remember that HDFS append really means "reopen for write." HDFS can have an error on writing and write a partial record even without anyone reopening for write. I would argue that this has everything to do with append. You are absolutely correct that HDFS can write a bad file on a standard open/write. The 'undo' for this failure is the delete operation. Your data integrity is preserved regardless of any external factors (file format, metadata, applications, etc). You can't have bad data if you never write bad data. The 'undo' for an reopen/write (append) failure is the truncate operation. To preserve data integrity independent of other factors, you have to truncate to the last known good size (we are assuming here that the existing file was written correctly). > HDFS truncate > ------------- > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Reporter: Lei Chang > Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)