[ 
https://issues.apache.org/jira/browse/HDFS-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195607#comment-14195607
 ] 

Colin Patrick McCabe commented on HDFS-3107:
--------------------------------------------

Hi Konstantin,

Yes, I was aware of HDFS-7056 and had commented about it earlier.  However, 
when you announced on this JIRA: "So, but the new patch from Plamen Jeliazkov 
already have snapshot support," the patch that I checked was 
{{HDFS-3107.patch}}, not a patch on a different JIRA.  After all, [~zero45] is 
the assignee on this JIRA as well as HDFS-7056, so "the new patch from Plamen 
Jeliazkov" could refer to either patch.  It seems logical to assume that what 
is being discussed on a particular JIRA is the patch attached to that JIRA, 
unless specified otherwise.

I think this highlights one confusing thing: that HDFS-3107 is now an umbrella 
JIRA as well as a JIRA with a (non-rollup) patch.  This creates confusion in 
people's minds because questions like "is HDFS-3107 done?" become ambiguous.  
It could be interpreted as either "is the patch on HDFS-3107 done?" or "is the 
feature discussed in HDFS-3107 done?"  To remove this confusion, I created the 
HDFS-7341 subtask to implement the part we've been discussing here.  That is, 
the pipeline-recovery based solution which the other subtasks build on top of.

I would like to create the HDFS-3107 branch as soon as possible.  I would have 
already created it, but as per my comment above, I want to make sure you are 
not objecting.  The quicker we can get this stuff into the branch, the quicker 
we can get this feature polished and merged into trunk.

As I commented earlier, it's not up to me to determine if something gets into 
2.6 or not.  It's up to the release manager (currently [~acmurthy]) and the 
community to vote.  Since 2.6 is so far along (it should have been released 
weeks ago, if the original schedule had been followed), I doubt that most 
people will welcome a big new feature getting added at this stage.  I don't 
think branch versus since commit matters in this regard-- a big new feature is 
a big new feature.  It's not going to "sneak in the back door"-- nor should it, 
given the problems we've had in the past (like with HDFS append).  We have time 
to do a thorough review.  In the meantime, distributions that are already 
shipping a variant of append can continue to do so, knowing that eventually the 
feature has a path to mainline.

Let me know if you have any objections to creating the branch, otherwise I'll 
do it tomorrow.  Thanks

> HDFS truncate
> -------------
>
>                 Key: HDFS-3107
>                 URL: https://issues.apache.org/jira/browse/HDFS-3107
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, namenode
>            Reporter: Lei Chang
>            Assignee: Plamen Jeliazkov
>         Attachments: HDFS-3107-HDFS-7056-combined.patch, HDFS-3107.008.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, HDFS-3107.patch, 
> HDFS_truncate.pdf, HDFS_truncate.pdf, HDFS_truncate.pdf, 
> HDFS_truncate_semantics_Mar15.pdf, HDFS_truncate_semantics_Mar21.pdf, 
> editsStored, editsStored.xml
>
>   Original Estimate: 1,344h
>  Remaining Estimate: 1,344h
>
> Systems with transaction support often need to undo changes made to the 
> underlying storage when a transaction is aborted. Currently HDFS does not 
> support truncate (a standard Posix operation) which is a reverse operation of 
> append, which makes upper layer applications use ugly workarounds (such as 
> keeping track of the discarded byte range per file in a separate metadata 
> store, and periodically running a vacuum process to rewrite compacted files) 
> to overcome this limitation of HDFS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to