Re: How to stop a mapper within a map-reduce job when you detect bad input

Harsh J Thu, 21 Oct 2010 03:37:28 -0700

If it occurs eventually as your record reader reads it, then you may
use a MapRunner class instead of a Mapper IFace/Subclass. This way,
you may try/catch over the record reader itself, and call your map
function only on valid next()s. I think this ought to work.


You can set it via JobConf.setMapRunnerClass(...).

Ref: MapRunner API @
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/MapRunner.html

On Wed, Oct 20, 2010 at 4:14 AM, ed <[email protected]> wrote:
> Hello,
>
> I have a simple map-reduce job that reads in zipped files and converts them
> to lzo compression.  Some of the files are not properly zipped which results
> in Hadoop throwing an "java.io.EOFException: Unexpected end of input stream
> error" and causes the job to fail.  Is there a way to catch this exception
> and tell hadoop to just ignore the file and move on?  I think the exception
> is being thrown by the class reading in the Gzip file and not my mapper
> class.  Is this correct?  Is there a way to handle this type of error
> gracefully?
>
> Thank you!
>
> ~Ed
>



-- 
Harsh J
www.harshj.com

Re: How to stop a mapper within a map-reduce job when you detect bad input

Reply via email to