[ 
https://issues.apache.org/jira/browse/HADOOP-1054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Enis Soztutar resolved HADOOP-1054.
-----------------------------------

    Resolution: Duplicate

HADOOP-1515 does exactly the same. 

> Add more then one input file per map?
> -------------------------------------
>
>                 Key: HADOOP-1054
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1054
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.11.2
>            Reporter: Johan Oskarsson
>            Priority: Trivial
>
> I've got a problem with mapreduce overhead when it comes to small input files.
> Roughly 100 mb comes in to the dfs every few hours. Then afterwards data 
> related to that batch might be added on for another few weeks.
> The problem is that this data is roughly 4-5 kbytes per file. So for every 
> reasonably big file we might have 4-5 small ones.
> As far as I understand it each small file will get assigned a task of it's 
> own. This causes performance issues since the overhead of such small
> files is pretty big.
> Would it be possible to have hadoop assign multiple files to a map task up 
> until a configurable limit?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to