[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137894#comment-14137894
 ] 

Ryan Maus commented on HDFS-3107:
---------------------------------

bq. Most HDFS users have logic to handle partial records, due to this problem. 
For example, Flume can roll to a new file when the old one has an error. It's 
pretty simple to prefix your records with a length, and simply ignore partial 
records that result from an incomplete flush to a file.

Using a record-length prefix is not a good fix to get around this.  What 
happens if you fail when writing your record length?

bq. I don't see what any of this has to do with append, since this issue could 
equally well come up without HDFS append. Remember that HDFS append really 
means "reopen for write." HDFS can have an error on writing and write a partial 
record even without anyone reopening for write.

I would argue that this has everything to do with append.  You are absolutely 
correct that HDFS can write a bad file on a standard open/write.  The 'undo' 
for this failure is the delete operation.  Your data integrity is preserved 
regardless of any external factors (file format, metadata, applications, etc).  
You can't have bad data if you never write bad data.

The 'undo' for an reopen/write (append) failure is the truncate operation.  To 
preserve data integrity independent of other factors, you have to truncate to 
the last known good size (we are assuming here that the existing file was 
written correctly).

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to