[ 
https://issues.apache.org/jira/browse/HIVE-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-1950:
-----------------------------
    Description: 
In our env, there are a lot of small files inside one partition/table. In order 
to reduce the namenode load, we have one dedicated housekeeping job running to 
merge these file. Right now the merge is an 'insert overwrite' in hive, and 
requires decompress the data and compress it. This jira is to add a command in 
Hive to do the merge without decompress and recompress the data.

Something like "alter table tbl_name [partition ()] concatenate". In this jira 
the new command will only support RCFile, since there need some new APIs to the 
fileformat.

  was:
In our env, there are a lot of small files inside one partition/table. In order 
to reduce the namenode load, we have one dedicated housekeeping job running to 
merge these file. Right now the merge is an 'insert overwrite' in hive, and 
requires decompress the data and compress it. This jira is to add a command in 
Hive to do the merge without decompress and recompress the data.

Something like "alter table tbl_name [partition ()] merge files". In this jira 
the new command will only support RCFile, since there need some new APIs to the 
fileformat.


> Block merge for RCFile
> ----------------------
>
>                 Key: HIVE-1950
>                 URL: https://issues.apache.org/jira/browse/HIVE-1950
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: He Yongqiang
>            Assignee: He Yongqiang
>             Fix For: 0.8.0
>
>         Attachments: HIVE-1950.1.patch, HIVE-1950.2.patch, HIVE-1950.3.patch, 
> HIVE-1950.4.patch, HIVE-1950.5.patch, HIVE-1950.6.patch
>
>
> In our env, there are a lot of small files inside one partition/table. In 
> order to reduce the namenode load, we have one dedicated housekeeping job 
> running to merge these file. Right now the merge is an 'insert overwrite' in 
> hive, and requires decompress the data and compress it. This jira is to add a 
> command in Hive to do the merge without decompress and recompress the data.
> Something like "alter table tbl_name [partition ()] concatenate". In this 
> jira the new command will only support RCFile, since there need some new APIs 
> to the fileformat.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to