[
https://issues.apache.org/jira/browse/HIVE-25837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yao Guangdong updated HIVE-25837:
---------------------------------
Attachment: HIVE-25837.0001.patch
> Hive merge file operation may consume long time
> -----------------------------------------------
>
> Key: HIVE-25837
> URL: https://issues.apache.org/jira/browse/HIVE-25837
> Project: Hive
> Issue Type: Improvement
> Components: Hive
> Affects Versions: All Versions
> Reporter: Yao Guangdong
> Priority: Major
> Attachments: HIVE-25837.0001.patch
>
>
> It will cost very long time in some cases when we use hive merge files.This
> is because we have thousands, even tens of thousands or more small files.But
> this files is very small.Most of small files only have a little kb.The merge
> file implement is only consider the target size(default 256M) at now.Which
> make one map will merge thousands, even tens of thousands or more small
> files.Which will cost too long time.
> In this case,we change the code not only consider the targe size but also
> care about the number of merge files per map(default 1024/map).Which may
> cause the target files small than user's setting,but compare with the cost on
> merge files i think user can accept it.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)