[jira] [Commented] (HDFS-3107) HDFS truncate

Colin Patrick McCabe (JIRA) Wed, 17 Sep 2014 13:10:52 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137882#comment-14137882
 ]


Colin Patrick McCabe commented on HDFS-3107:
--------------------------------------------

bq. The main use case as far as I understand from this and other conversations 
is transaction handling for external databases. DB writes its transactions into 
a HDFS file. While transactions succeeds the DB keeps writing the same file. 
But when tx fails it is aborted and the file is truncated to the previous 
successfull transaction.

As I mentioned earlier, the external database could simply use length-prefixed 
records.  Then, if it encounters a partial record, it is ignored.  Flume has 
been doing something like this for a while.  So I don't see this as a very 
important use case.

bq. There is no divergent history. If you truncate you loose data that you 
truncated. You will not be able to open file for append until truncate is 
catually comleted and DNs shrink the last block replicas. Then file can be 
opened for append and add new data.

OK, that's fair.  The single-writer nature of HDFS makes this easier.

There are also interactions with files open for read.  The NameNode doesn't 
know which files are open for read so you cannot forbid this.  Keep in mind 
that there is a generous amount of buffering inside DFSInputStream.  So 
following a truncate + append new data, we may continue to read the "old" 
truncated data from the buffer inside the DFSInputStream's 
{{RemoteBlockReader2}} for a while.  That is partly what I meant by a 
"divergent history." This is probably ok, but the semantics need to be spelled 
out in the design doc.

bq. How this interacts with snapshots... is something yet to be designed

OK.

bq. There is a patch attached. Did you have a chance to review? It is much 
simpler than append, but it does not allow to truncate files in snapshots. If 
we decide to implement copy-on-write approach for truncated files in snapshots, 
then we may end up creating a branch.

I'm -1 on committing anything without a design doc.  I apologize if this seems 
harsh, but I don't want there to be any ambiguity.  I think you are on the 
right track, but let's see a complete design and then get started with 
committing the code.  Thanks, Konstantin.

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, 
> HDFS_truncate_semantics_Mar21.pdf
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HDFS-3107) HDFS truncate

Reply via email to