Mapred, If you fail a task permanently upon encountering a bad situation, you basically end up failing the job as well, automatically. By controlling the number of retries (say down to 1 or 2 from 4 default total attempts), you can also have it fail the job faster.
Is killing the job immediately a necessity? Why? I s'pose you could call kill from within the mapper, but I've never seen that as necessary in any situation so far. Whats wrong with letting the job auto-die as a result of a failing task? On 16-Nov-2011, at 12:38 AM, Mapred Learn wrote: > Thanks David for a step-by-step response but this makes error threshold, a > per mapper threshold. Is there a way to make it per job so that all mappers > share this value and increment it as a shared counter ? > > > On Tue, Nov 15, 2011 at 8:12 AM, David Rosenstrauch <dar...@darose.net> wrote: > On 11/14/2011 06:06 PM, Mapred Learn wrote: > Hi, > > I have a use case where I want to pass a threshold value to a map-reduce > job. For eg: error records=10. > > I want map-reduce job to fail if total count of error_records in the job > i.e. all mappers, is reached. > > How can I implement this considering that each mapper would be processing > some part of the input data ? > > Thanks, > -JJ > > 1) Pass in the threshold value as configuration value of the M/R job. (i.e., > job.getConfiguration().setInt("error_threshold", 10) ) > > 2) Make your mappers implement the Configurable interface. This will ensure > that every mapper gets passed a copy of the config object. > > 3) When you implement the setConf() method in your mapper (which Configurable > will force you to do), retrieve the threshold value from the config and save > it in an instance variable in the mapper. (i.e., int errorThreshold = > conf.getInt("error_threshold") ) > > 4) In the mapper, when an error record occurs, increment a counter and then > check if the counter value exceeds the threshold. If so, throw an exception. > (e.g., if (++numErrors >= errorThreshold) throw new RuntimeException("Too > many errors") ) > > The exception will kill the mapper. Hadoop will attempt to re-run it, but > subsequent attempts will also fail for the same reason, and eventually the > entire job will fail. > > HTH, > > DR >