You can do two passes of the data.
The first map-reduce pass is sanity checking the data.
The second map-reduce pass is to do the real work assuming the first pass 
accept the file.

You can utilize the dynamic counter and define an enum type for error record 
categories.
In the mapper, you parse each line, and use the result to update the counter.

-Mingxi

From: Mapred Learn [mailto:mapred.le...@gmail.com]
Sent: Monday, November 14, 2011 3:06 PM
To: mapreduce-user@hadoop.apache.org
Subject: how to implement error thresholds in a map-reduce job ?

Hi,

I have a use  case where I want to pass a threshold value to a map-reduce job. 
For eg: error records=10.

I want map-reduce job to fail if total count of error_records in the job i.e. 
all mappers, is reached.

How can I implement this considering that each mapper would be processing some 
part of the input data ?

Thanks,
-JJ

Reply via email to