merge small files whenever possible
-----------------------------------
Key: HIVE-439
URL: https://issues.apache.org/jira/browse/HIVE-439
Project: Hadoop Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain
There are cases when the input to a Hive job are thousands of small files. In
this case, there is a mapper for each file. Most of the overhead for spawning
all these mappers can be avoided if these small files are combined into fewer
larger files.
The problem can also be addressed by having a mapper span multiple blocks as in:
https://issues.apache.org/jira/browse/HIVE-74
Bit, it also makes sense in HIVE to merge files whenever possible.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.