Here is  somewhat different but related issue:

it would be useful to make the framework distinguish between deterministic and non-deterministic failures and react differently to them.

E.g.
-- in streaming, a Perl script has a syntax error. There is no need to check for this 4*300 times. -- the same exception (with the same stack) is thrown while processing the same record. (G's MapReduce supposedly is capable to skip the offending record at the next attempt, but short of that, why keep trying?)

(Of course this is just an optimization, while 1304 is a functionality one cannot do without....)

-- ab

On Apr 30, 2007, at 12:34 PM, Arun C Murthy (JIRA) wrote:


[ https://issues.apache.org/jira/browse/HADOOP-1304? page=com.atlassian.jira.plugin.system.issuetabpanels:comment- tabpanel#action_12492766 ]

Arun C Murthy commented on HADOOP-1304:
---------------------------------------

One concern with this 'feature' is that we want a reasonable cap on what the user can set max attempts to, else we could have a situation where a user unknowingly, not maliciously, sets it a very large value - thus the framework is now vulnerable to one wrongly configured job hogging the cluster...

Also, as per a discussion with Doug we could follow lucene's convention of classifying this knob as 'Expert' so as to clearly elucidate it's importance...

MAX_TASK_FAILURES should be configurable
----------------------------------------

                Key: HADOOP-1304
URL: https://issues.apache.org/jira/browse/ HADOOP-1304
            Project: Hadoop
         Issue Type: Improvement
         Components: mapred
   Affects Versions: 0.12.3
           Reporter: Christian Kunz
        Assigned To: Devaraj Das
        Attachments: 1304.patch, 1304.patch


After a couple of weeks of failed attempts I was able to finish a large job only after I changed MAX_TASK_FAILURES to a higher value. In light of HADOOP-1144 (allowing a certain amount of task failures without failing the job) it would be even better if this value could be configured separately for mappers and reducers, because often a success of a job requires the success of all reducers but not of all mappers.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Reply via email to