[ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104684#comment-14104684
 ] 

Todd Lipcon commented on HDFS-3689:
-----------------------------------

The replication issue with sparse files isn't new. rsync for example handles 
this with the "--sparse" flag. I haven't looked at the implementation, but my 
guess is that it would be relatively easy to implement this on the DN side 
following whatever technique rsync does. One thought is that we could identify 
runs of zeros fairly easily by looking at the checksums: an all-zero checksum 
chunk has a constant crc32 which we can compare for in a single instruction. 
The DN could relatively easily loop through the checksums of an incoming data 
packet, and verify whether it is all zeros, and if so, turn it into a sparse 
write.

> Add support for variable length block
> -------------------------------------
>
>                 Key: HDFS-3689
>                 URL: https://issues.apache.org/jira/browse/HDFS-3689
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client, namenode
>    Affects Versions: 3.0.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch
>
>
> Currently HDFS supports fixed length blocks. Supporting variable length block 
> will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to