[jira] [Comment Edited] (HDFS-3107) HDFS truncate

Konstantin Shvachko (JIRA) Tue, 30 Sep 2014 00:11:01 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14152875#comment-14152875
 ]


Konstantin Shvachko edited comment on HDFS-3107 at 9/30/14 7:09 AM:
--------------------------------------------------------------------

Thanks Dhruba and Colin for your reviews of the design document.
Colin, I'll incorporate your suggestions. But it looks that you got everything 
right from the current edition.

??{{boolean truncate(Path src, long newLength)}}. do we really need the boolean 
here???
* This is an optimization for the case when truncate happens on the block 
boundary. Clients will save one RPC call in this particular case.
>From NameNode perspective returning the boolean does not require any extra 
>processing.

??DFSInputStream#locatedBlocks will continue to have the block information it 
had prior to truncation.??
* Don't we have [the same behaviour with 
deletes|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=13237310&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13237310].
Somebody can delete a file on the NameNode, but readers will keep reading old 
blocks until they are deleted.
Truncate doesn't add anything new in that regard.

??I don't think we should commit anything to trunk until we figure out how this 
integrates with snapshots.??
* You should have seen HDFS-7056 subtask. Mentioning it again to reassure there 
is no intension to avoid the snapshot issue.
* People agreed above that [they are 
OK|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14129351&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129351]
 implementing snapshot integration in a separate jira.
* We also [agreed not to port it to branch 
2|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14148406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14148406]
 until this is completed.
* And there was a [request to 
commit|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14150308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150308]
 this sooner rather than later.
* Besides, it seems from your comments that you yourself are in favour of 
option 3 for snapshots from the design.

So the question arise what is not clear in the truncate-snapshot story and why 
you object committing anything to trunk?


was (Author: shv):
Thanks Dhruba and Colin for your reviews of the design document.
Colin, I'll incorporate your suggestions. But it looks that you got everything 
right from the current edition.

??{{boolean truncate(Path src, long newLength)}}. do we really need the boolean 
here???
* This is an optimization for the case when truncate happens on the block 
boundary. Clients will save one RPC call in this particular case.
>From NameNode perspective returning the boolean does not require any extra 
>processing.

??DFSInputStream#locatedBlocks will continue to have the block information it 
had prior to truncation.??
* Don't we have the same behaviour with deletes.
Somebody can delete a file on the NameNode, but readers will keep reading old 
blocks until they are deleted.
Truncate doesn't add anything new in that regard.

??I don't think we should commit anything to trunk until we figure out how this 
integrates with snapshots.??
* You should have seen HDFS-7056 subtask. Mentioning it again to reassure there 
is no intension to avoid the snapshot issue.
* People agreed above that [they are 
OK|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14129351&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14129351]
 implementing snapshot integration in a separate jira.
* We also [agreed not to port it to branch 
2|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14148406&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14148406]
 until this is completed.
* And there was a [request to 
commit|https://issues.apache.org/jira/browse/HDFS-3107?focusedCommentId=14150308&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14150308]
 this sooner rather than later.
* Besides, it seems from your comments that you yourself are in favour of 
option 3 for snapshots from the design.

So the question arise what is not clear in the truncate-snapshot story and why 
you object committing anything to trunk?

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HDFS-3107) HDFS truncate

Reply via email to