[
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104536#comment-14104536
]
Doug Cutting commented on HDFS-3689:
------------------------------------
[~sureshms], variable length blocks would permit applications to read data
without modification, but with surprising performance and impact on cluster
resources. One could, e.g., efficiently append a bunch of CSV files to
generate a bit CSV file that has variable length blocks, then run MapReduce
jobs over that file. But the file reads would no longer be block aligned and
the job would behave differently than one might expect.
On the other hand, a sparse file would permit folks to append data as
efficiently as variable-length blocks, but to unmodified applications their
input would now have chunks of zeros inserted and would likely not be
well-formatted data. So using sparse files forces applications to explicitly
adopt the feature, rather than appearing to still work but with radically
different performance.
It might be better not to have a "transparent" feature that contains
performance surprises, but instead have something that both writers and readers
must knowingly adopt.
> Add support for variable length block
> -------------------------------------
>
> Key: HDFS-3689
> URL: https://issues.apache.org/jira/browse/HDFS-3689
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, hdfs-client, namenode
> Affects Versions: 3.0.0
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
> Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch
>
>
> Currently HDFS supports fixed length blocks. Supporting variable length block
> will allow new use cases and features to be built on top of HDFS.
--
This message was sent by Atlassian JIRA
(v6.2#6252)