Custom Splitter for handling many small files
---------------------------------------------
Key: HADOOP-3387
URL: https://issues.apache.org/jira/browse/HADOOP-3387
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Subramaniam Krishnan
Fix For: 0.18.0
Hadoop by default allocates a Map to a file irrespective of size. This is not
optimal if you have a large number of small files, for e.g:- If you 2000 100KB
files, 2000 Maps will be allocated for the job.
The Custom Multi File Splitter collapses all the small files to a single split
till the DFS Block Size is hit.
It also take care of handling big files by splitting them on Block Size and
adding up all the reminders(if any) to a further splits of Block Size.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.