Making RCFile "concatenatable" to reduce the number of files of the output
--------------------------------------------------------------------------

                 Key: HIVE-1071
                 URL: https://issues.apache.org/jira/browse/HIVE-1071
             Project: Hadoop Hive
          Issue Type: Improvement
            Reporter: Zheng Shao


Hive automatically determine the number of reducers most of the time.
Sometimes, we create a lot of small files.

Hive has an option to "merge" those small files though a map-reduce job.

Dhruba has the idea which can fix it even faster:
if we can make RCFile concatenatable, then we can simply tell the namenode to 
"merge" these files.

Pros: This approach does not do any I/O so it's faster.
Cons: We have to zero-fill the files to make sure they can be concatenated (all 
blocks except the last have to be full HDFS blocks).




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to