Re: how to implement error thresholds in a map-reduce job ?

David Rosenstrauch Tue, 15 Nov 2011 12:55:38 -0800

I can't think of an easy way to do this. There's a few not-so-easyapproaches:

* Implement numErrors as a Hadoop counter, and then have the applicationwhich submitted the job check the value of that counter once the job iscomplete and have the app throw an error if the counter exceeds thethreshold. (Not exactly what you're asking, since this wouldn't causethe job to fail in error, but rather you would monitor the job and causeyour app to fail in error if needed.)

* Store the numErrors counter externally - say in Apache Zookeeper orRedis - and have each map task increment the counter and fail the job ifit exceeds the threshold. Again, though, some issues to work aroundhere: due to speculative execution, Hadoop might submit extra maptasks, so this could throw off the counter. You'd have to make sure toonly increment the counter when a map tasks completes successfully.


HTH,

DR

On 11/15/2011 02:46 PM, Mapred Learn wrote:

Hi Harsh,

My situation is to kill a job when this threshold is reached. If say
threshold is 10. And 2 mappers combined reached this value, how should I
achieve this.

With what you are saying, I think job will fail once a single mapper
reaches that threshold.

Thanks,


On Tue, Nov 15, 2011 at 11:22 AM, Harsh J<ha...@cloudera.com>  wrote:

Mapred,

If you fail a task permanently upon encountering a bad situation, you
basically end up failing the job as well, automatically. By controlling the
number of retries (say down to 1 or 2 from 4 default total attempts), you
can also have it fail the job faster.

Is killing the job immediately a necessity? Why?

I s'pose you could call kill from within the mapper, but I've never seen
that as necessary in any situation so far. Whats wrong with letting the job
auto-die as a result of a failing task?

  On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote:

  Thanks David for a step-by-step response but this makes error threshold,
a per mapper threshold. Is there a way to make it per job so that all
mappers share this value and increment it as a shared counter ?


On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch<dar...@darose.net>wrote:

  On 11/14/2011 06:06 PM, Mapred Learn wrote:

Hi,

I have a use  case where I want to pass a threshold value to a map-reduce
job. For eg: error records=10.

I want map-reduce job to fail if total count of error_records in the job
i.e. all mappers, is reached.

How can I implement this considering that each mapper would be processing
some part of the input data ?

Thanks,
-JJ


1) Pass in the threshold value as configuration value of the M/R job.
(i.e., job.getConfiguration().setInt(**"error_threshold", 10) )

2) Make your mappers implement the Configurable interface.  This will
ensure that every mapper gets passed a copy of the config object.

3) When you implement the setConf() method in your mapper (which
Configurable will force you to do), retrieve the threshold value from the
config and save it in an instance variable in the mapper.  (i.e., int
errorThreshold = conf.getInt("error_threshold") )

4) In the mapper, when an error record occurs, increment a counter and
then check if the counter value exceeds the threshold.  If so, throw an
exception.  (e.g., if (++numErrors>= errorThreshold) throw new
RuntimeException("Too many errors") )

The exception will kill the mapper.  Hadoop will attempt to re-run it,
but subsequent attempts will also fail for the same reason, and eventually
the entire job will fail.

HTH,

DR

Re: how to implement error thresholds in a map-reduce job ?

Reply via email to