[
https://issues.apache.org/jira/browse/HDFS-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14107194#comment-14107194
]
Owen O'Malley commented on HDFS-3689:
-------------------------------------
One follow up is that fixing MapReduce to use the actual block boundaries
rather than dividing up the file in fixed size splits would not be difficult
and would make the generated file splits for ORC and other block compressed
files much much better.
Furthermore, note that we could remove the need for lzo and zlib index files
for text files by having TextOutputFormat cut the block at a line boundary and
flush the compression codec. Thus TextInputFormat could divide the file at
block boundaries and have them align at both a compression chunk boundary and a
line break. That would be *great*.
> Add support for variable length block
> -------------------------------------
>
> Key: HDFS-3689
> URL: https://issues.apache.org/jira/browse/HDFS-3689
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: datanode, hdfs-client, namenode
> Affects Versions: 3.0.0
> Reporter: Suresh Srinivas
> Assignee: Suresh Srinivas
> Attachments: HDFS-3689.000.patch, HDFS-3689.001.patch
>
>
> Currently HDFS supports fixed length blocks. Supporting variable length block
> will allow new use cases and features to be built on top of HDFS.
--
This message was sent by Atlassian JIRA
(v6.2#6252)