[
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106298#comment-14106298
]
Colin Patrick McCabe commented on HDFS-3689:
--------------------------------------------
So, there are a few use-cases for variable-length blocks that we've kicked
around in the past:
* Simpler implementation of append and pipeline recovery. We could just start
a new block and forget about the old blocks. genstamp can go away, as well as
all the pipeline recovery code and replica state machine. Replicas are then
either finalized or not, like in the original Hadoop versions.
* Make hdfsConcat fully generic, rather than requiring N-1 of the files being
concatted to be exactly 1 block long like now. This would make that call a lot
more useful. (Implemented above by Jing)
* Some file formats really, really want to have block-aligned records. This is
natural if you want to have one node process a set of records... you don't want
"torn" records that span multiple datanodes. Apache Parquet is certainly one
of these formats; I think ORCFile is too. Right now these file formats need to
accept "torn" records or add padding. I guess sparse files could make the
padding less inefficient.
Disadvantages of variable-length blocks:
* As Doug pointed out, MapReduce InputFormats that use # of blocks to decide on
a good data split won't work too well. I wonder how much effort it would take
to convert these to take block length into account?
* Other applications may also be assuming fixed block sizes, although our APIs
have never technically guaranteed that.
> Add support for variable length block
> -------------------------------------
>
> Key: HDFS-3689
> URL: https://issues.apache.org/jira/browse/HDFS-3689
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, hdfs-client, namenode
> Affects Versions: 3.0.0
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
> Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch
>
>
> Currently HDFS supports fixed length blocks. Supporting variable length block
> will allow new use cases and features to be built on top of HDFS.
--
This message was sent by Atlassian JIRA
(v6.2#6252)