Re: How to stop a mapper within a map-reduce job when you detect bad input

ed Thu, 21 Oct 2010 10:29:31 -0700

Thanks Tom! Didn't see your post before posting =)

On Thu, Oct 21, 2010 at 1:28 PM, ed <[email protected]> wrote:


> Sorry to keep spamming this thread.  It looks like the correct way to
> implement MapRunnable using the new mapreduce classes (instead of the
> deprecated mapred) is to override the run() method of the mapper class.
> This is actually nice and convenient since everyone should already be using
> Mapper class (org.apache.hadoop.mapreduce.Maper<KEYIN, VALUEIN, KEYOUT,
> VALUEOUT> for their mappers.
>
> ~Ed
>
>
> On Thu, Oct 21, 2010 at 12:14 PM, ed <[email protected]> wrote:
>
>> Just checked the Hadoop 0.21.0 API docs (I was looking in the wrong docs
>> before) and it doesn't look like MapRunner is deprecated so I'll try
>> catching the error there and will report back if it's a good solution.
>> Thanks!
>>
>> ~Ed
>>
>>
>> On Thu, Oct 21, 2010 at 11:23 AM, ed <[email protected]> wrote:
>>
>>> Hello,
>>>
>>> The MapRunner classes looks promising.  I noticed it is in the deprecated
>>> mapred package but I didn't see an equivalent class in the mapreduce
>>> package.  Is this going to ported to mapreduce or is it no longer being
>>> supported?  Thanks!
>>>
>>> ~Ed
>>>
>>>
>>> On Thu, Oct 21, 2010 at 6:36 AM, Harsh J <[email protected]> wrote:
>>>
>>>> If it occurs eventually as your record reader reads it, then you may
>>>> use a MapRunner class instead of a Mapper IFace/Subclass. This way,
>>>> you may try/catch over the record reader itself, and call your map
>>>> function only on valid next()s. I think this ought to work.
>>>>
>>>> You can set it via JobConf.setMapRunnerClass(...).
>>>>
>>>> Ref: MapRunner API @
>>>>
>>>> http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/MapRunner.html
>>>>
>>>> On Wed, Oct 20, 2010 at 4:14 AM, ed <[email protected]> wrote:
>>>> > Hello,
>>>> >
>>>> > I have a simple map-reduce job that reads in zipped files and converts
>>>> them
>>>> > to lzo compression.  Some of the files are not properly zipped which
>>>> results
>>>> > in Hadoop throwing an "java.io.EOFException: Unexpected end of input
>>>> stream
>>>> > error" and causes the job to fail.  Is there a way to catch this
>>>> exception
>>>> > and tell hadoop to just ignore the file and move on?  I think the
>>>> exception
>>>> > is being thrown by the class reading in the Gzip file and not my
>>>> mapper
>>>> > class.  Is this correct?  Is there a way to handle this type of error
>>>> > gracefully?
>>>> >
>>>> > Thank you!
>>>> >
>>>> > ~Ed
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>> www.harshj.com
>>>>
>>>
>>>
>>
>

Re: How to stop a mapper within a map-reduce job when you detect bad input

Reply via email to