[jira] [Commented] (HDFS-3689) Add support for variable length block

Colin Patrick McCabe (JIRA) Thu, 21 Aug 2014 18:27:54 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14106298#comment-14106298
 ]


Colin Patrick McCabe commented on HDFS-3689:
--------------------------------------------

So, there are a few use-cases for variable-length blocks that we've kicked 
around in the past:

* Simpler implementation of append and pipeline recovery.  We could just start 
a new block and forget about the old blocks.  genstamp can go away, as well as 
all the pipeline recovery code and replica state machine.  Replicas are then 
either finalized or not, like in the original Hadoop versions.

* Make hdfsConcat fully generic, rather than requiring N-1 of the files being 
concatted to be exactly 1 block long like now.  This would make that call a lot 
more useful.  (Implemented above by Jing)

* Some file formats really, really want to have block-aligned records.  This is 
natural if you want to have one node process a set of records... you don't want 
"torn" records that span multiple datanodes.  Apache Parquet is certainly one 
of these formats; I think ORCFile is too.  Right now these file formats need to 
accept "torn" records or add padding.  I guess sparse files could make the 
padding less inefficient.

Disadvantages of variable-length blocks:

* As Doug pointed out, MapReduce InputFormats that use # of blocks to decide on 
a good data split won't work too well.  I wonder how much effort it would take 
to convert these to take block length into account?

* Other applications may also be assuming fixed block sizes, although our APIs 
have never technically guaranteed that.

> Add support for variable length block
> -------------------------------------
>
>                 Key: HDFS-3689
>                 URL: https://issues.apache.org/jira/browse/HDFS-3689
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: datanode, hdfs-client, namenode
>    Affects Versions: 3.0.0
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>         Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch
>
>
> Currently HDFS supports fixed length blocks. Supporting variable length block 
> will allow new use cases and features to be built on top of HDFS. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HDFS-3689) Add support for variable length block

Reply via email to