[ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14104536#comment-14104536
 ] 

Doug Cutting commented on HDFS-3689:
------------------------------------

[~sureshms], variable length blocks would permit applications to read data 
without modification, but with surprising performance and impact on cluster 
resources.  One could, e.g., efficiently append a bunch of CSV files to 
generate a bit CSV file that has variable length blocks, then run MapReduce 
jobs over that file.  But the file reads would no longer be block aligned and 
the job would behave differently than one might expect.

On the other hand, a sparse file would permit folks to append data as 
efficiently as variable-length blocks, but to unmodified applications their 
input would now have chunks of zeros inserted and would likely not be 
well-formatted data.  So using sparse files forces applications to explicitly 
adopt the feature, rather than appearing to still work but with radically 
different performance.

It might be better not to have a "transparent" feature that contains 
performance surprises, but instead have something that both writers and readers 
must knowingly adopt.

> Add support for variable length block
> -------------------------------------
>
>                 Key: HDFS-3689
>                 URL: https://issues.apache.org/jira/browse/HDFS-3689
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client, namenode
>    Affects Versions: 3.0.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch
>
>
> Currently HDFS supports fixed length blocks. Supporting variable length block 
> will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to