[jira] Commented: (HADOOP-960) Incorrect number of map tasks when there are multiple input files

Doug Cutting (JIRA) Tue, 30 Jan 2007 11:14:55 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468747
 ]


Doug Cutting commented on HADOOP-960:
-------------------------------------

> it seems like the need to split input evenly would be pretty common

Can you tell more about why you think this is important and useful?  It's not 
obvious to me.

Also, your original complaint was about the *number* of splits not matching 
what you expect.  Now you're complaining about the *size* of the splits not 
being even.  Which is it you need?  Both?  Why?  If you pass one big file and 
one little file and ask for six splits, should it break each file into three or 
break the bigger file into four and the smaller in two?  How should file size 
be measured: number of records or number of bytes?  There are myriad 
possibilities.  The base class implements something that should work well in 
many cases by default, and it also has some knobs that make it somewhat 
flexible, but it's not well documented.

> Incorrect number of map tasks when there are multiple input files
> -----------------------------------------------------------------
>
>                 Key: HADOOP-960
>                 URL: https://issues.apache.org/jira/browse/HADOOP-960
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 0.10.1
>            Reporter: Andrew McNabb
>            Priority: Minor
>
> This problem happens with hadoop-streaming and possibly elsewhere.  If there 
> are 5 input files, it will create 130 map tasks, even if 
> mapred.map.tasks=128.  The number of map tasks is incorrectly set to a 
> multiple of the number of files.  (I wrote a much more complete bug report, 
> but Jira lost it when it had an error, so I'm not in the mood to write it all 
> again)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-960) Incorrect number of map tasks when there are multiple input files

Reply via email to