[
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104684#comment-14104684
]
Todd Lipcon commented on HDFS-3689:
-----------------------------------
The replication issue with sparse files isn't new. rsync for example handles
this with the "--sparse" flag. I haven't looked at the implementation, but my
guess is that it would be relatively easy to implement this on the DN side
following whatever technique rsync does. One thought is that we could identify
runs of zeros fairly easily by looking at the checksums: an all-zero checksum
chunk has a constant crc32 which we can compare for in a single instruction.
The DN could relatively easily loop through the checksums of an incoming data
packet, and verify whether it is all zeros, and if so, turn it into a sparse
write.
> Add support for variable length block
> -------------------------------------
>
> Key: HDFS-3689
> URL: https://issues.apache.org/jira/browse/HDFS-3689
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, hdfs-client, namenode
> Affects Versions: 3.0.0
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
> Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch
>
>
> Currently HDFS supports fixed length blocks. Supporting variable length block
> will allow new use cases and features to be built on top of HDFS.
--
This message was sent by Atlassian JIRA
(v6.2#6252)