[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992112#comment-12992112
 ] 

He Yongqiang commented on HIVE-1950:
------------------------------------

review comments from internal review:
1) if the stats present, try to correct it
2) jobClose of RCFileMergeMapper should share the code in FileSinkOperator
3) move the original data to a dump loc first
4) remove getRecordWriter() and RCFileBlockMergeOutputFormat
5) ioCxt for input file changed
6) disable merge for archived table/partition and bucketized table/partition
7) comments
8) negative tests for hiveinputformat



> Block merge for RCFile
> ----------------------
>
>                 Key: HIVE-1950
>                 URL: https://issues.apache.org/jira/browse/HIVE-1950
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>         Attachments: HIVE-1950.1.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] merge files". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to