Making RCFile "concatenatable" to reduce the number of files of the output --------------------------------------------------------------------------
Key: HIVE-1071 URL: https://issues.apache.org/jira/browse/HIVE-1071 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao Hive automatically determine the number of reducers most of the time. Sometimes, we create a lot of small files. Hive has an option to "merge" those small files though a map-reduce job. Dhruba has the idea which can fix it even faster: if we can make RCFile concatenatable, then we can simply tell the namenode to "merge" these files. Pros: This approach does not do any I/O so it's faster. Cons: We have to zero-fill the files to make sure they can be concatenated (all blocks except the last have to be full HDFS blocks). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.