[ https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992112#comment-12992112 ]
He Yongqiang commented on HIVE-1950: ------------------------------------ review comments from internal review: 1) if the stats present, try to correct it 2) jobClose of RCFileMergeMapper should share the code in FileSinkOperator 3) move the original data to a dump loc first 4) remove getRecordWriter() and RCFileBlockMergeOutputFormat 5) ioCxt for input file changed 6) disable merge for archived table/partition and bucketized table/partition 7) comments 8) negative tests for hiveinputformat > Block merge for RCFile > ---------------------- > > Key: HIVE-1950 > URL: https://issues.apache.org/jira/browse/HIVE-1950 > Project: Hive > Issue Type: New Feature > Reporter: He Yongqiang > Assignee: He Yongqiang > Attachments: HIVE-1950.1.patch > > > In our env, there are a lot of small files inside one partition/table. In > order to reduce the namenode load, we have one dedicated housekeeping job > running to merge these file. Right now the merge is an 'insert overwrite' in > hive, and requires decompress the data and compress it. This jira is to add a > command in Hive to do the merge without decompress and recompress the data. > Something like "alter table tbl_name [partition ()] merge files". In this > jira the new command will only support RCFile, since there need some new APIs > to the fileformat. -- This message is automatically generated by JIRA. - For more information on JIRA, see: http://www.atlassian.com/software/jira