[
https://issues.apache.org/jira/browse/HIVE-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Carl Steinbach updated HIVE-1071:
---------------------------------
Component/s: Serializers/Deserializers
> Making RCFile "concatenatable" to reduce the number of files of the output
> --------------------------------------------------------------------------
>
> Key: HIVE-1071
> URL: https://issues.apache.org/jira/browse/HIVE-1071
> Project: Hive
> Issue Type: Improvement
> Components: Serializers/Deserializers
> Reporter: Zheng Shao
>
> Hive automatically determine the number of reducers most of the time.
> Sometimes, we create a lot of small files.
> Hive has an option to "merge" those small files though a map-reduce job.
> Dhruba has the idea which can fix it even faster:
> if we can make RCFile concatenatable, then we can simply tell the namenode to
> "merge" these files.
> Pros: This approach does not do any I/O so it's faster.
> Cons: We have to zero-fill the files to make sure they can be concatenated
> (all blocks except the last have to be full HDFS blocks).
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira