[ https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14137882#comment-14137882 ]
Colin Patrick McCabe commented on HDFS-3107: -------------------------------------------- bq. The main use case as far as I understand from this and other conversations is transaction handling for external databases. DB writes its transactions into a HDFS file. While transactions succeeds the DB keeps writing the same file. But when tx fails it is aborted and the file is truncated to the previous successfull transaction. As I mentioned earlier, the external database could simply use length-prefixed records. Then, if it encounters a partial record, it is ignored. Flume has been doing something like this for a while. So I don't see this as a very important use case. bq. There is no divergent history. If you truncate you loose data that you truncated. You will not be able to open file for append until truncate is catually comleted and DNs shrink the last block replicas. Then file can be opened for append and add new data. OK, that's fair. The single-writer nature of HDFS makes this easier. There are also interactions with files open for read. The NameNode doesn't know which files are open for read so you cannot forbid this. Keep in mind that there is a generous amount of buffering inside DFSInputStream. So following a truncate + append new data, we may continue to read the "old" truncated data from the buffer inside the DFSInputStream's {{RemoteBlockReader2}} for a while. That is partly what I meant by a "divergent history." This is probably ok, but the semantics need to be spelled out in the design doc. bq. How this interacts with snapshots... is something yet to be designed OK. bq. There is a patch attached. Did you have a chance to review? It is much simpler than append, but it does not allow to truncate files in snapshots. If we decide to implement copy-on-write approach for truncated files in snapshots, then we may end up creating a branch. I'm -1 on committing anything without a design doc. I apologize if this seems harsh, but I don't want there to be any ambiguity. I think you are on the right track, but let's see a complete design and then get started with committing the code. Thanks, Konstantin. > HDFS truncate > ------------- > > Key: HDFS-3107 > URL: https://issues.apache.org/jira/browse/HDFS-3107 > Project: Hadoop HDFS > Issue Type: New Feature > Components: datanode, namenode > Reporter: Lei Chang > Assignee: Plamen Jeliazkov > Attachments: HDFS-3107.patch, HDFS_truncate_semantics_Mar15.pdf, > HDFS_truncate_semantics_Mar21.pdf > > Original Estimate: 1,344h > Remaining Estimate: 1,344h > > Systems with transaction support often need to undo changes made to the > underlying storage when a transaction is aborted. Currently HDFS does not > support truncate (a standard Posix operation) which is a reverse operation of > append, which makes upper layer applications use ugly workarounds (such as > keeping track of the discarded byte range per file in a separate metadata > store, and periodically running a vacuum process to rewrite compacted files) > to overcome this limitation of HDFS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)