[ https://issues.apache.org/jira/browse/HIVE-1071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12803040#action_12803040 ]
Jeff Hammerbacher commented on HIVE-1071: ----------------------------------------- bq. we could create a API in HDFS that concatenates a set of files into one file. Would be a fantastic primitive to add to HDFS. > Making RCFile "concatenatable" to reduce the number of files of the output > -------------------------------------------------------------------------- > > Key: HIVE-1071 > URL: https://issues.apache.org/jira/browse/HIVE-1071 > Project: Hadoop Hive > Issue Type: Improvement > Reporter: Zheng Shao > > Hive automatically determine the number of reducers most of the time. > Sometimes, we create a lot of small files. > Hive has an option to "merge" those small files though a map-reduce job. > Dhruba has the idea which can fix it even faster: > if we can make RCFile concatenatable, then we can simply tell the namenode to > "merge" these files. > Pros: This approach does not do any I/O so it's faster. > Cons: We have to zero-fill the files to make sure they can be concatenated > (all blocks except the last have to be full HDFS blocks). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.