Custom Splitter for handling many small files
---------------------------------------------

                 Key: HADOOP-3387
                 URL: https://issues.apache.org/jira/browse/HADOOP-3387
             Project: Hadoop Core
          Issue Type: Improvement
          Components: mapred
            Reporter: Subramaniam Krishnan
             Fix For: 0.18.0



Hadoop by default allocates a Map to a file irrespective of size. This is not 
optimal if you have a large number of small files, for e.g:- If you 2000 100KB 
files, 2000 Maps will be allocated for the job.

The Custom Multi File Splitter collapses all the small files to a single split 
till the DFS Block Size is hit. 
It also take care of handling big files by splitting them on Block Size and 
adding up all the reminders(if any) to a further splits of Block Size. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to