Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec

Leonardo Urbina Wed, 18 Jan 2012 16:02:00 -0800

Perhaps you are not reporting progress throughout your task. If you
happen to run a job large enough job you hit the the default timeout
mapred.task.timeout  (that defaults to 10 min). Perhaps you should
consider reporting progress in your mapper/reducer by calling
progress() on the Reporter object. Check tip 7 of this link:


http://www.cloudera.com/blog/2009/05/10-mapreduce-tips/

Hope that helps,
-Leo

Sent from my phone

On Jan 18, 2012, at 6:46 PM, Steve Lewis <[email protected]> wrote:

> I KNOW is is a task timeout - what I do NOT know is WHY merely cutting the
> number of writes causes it to go away. It seems to imply that some
> context.write operation or something downstream from that is taking a huge
> amount of time and that is all hadoop internal code - not mine so my
> question is why should increasing the number and volume of wriotes cause a
> task to time out
>
> On Wed, Jan 18, 2012 at 2:33 PM, Tom Melendez <[email protected]> wrote:
>
>> Sounds like mapred.task.timeout?  The default is 10 minutes.
>>
>> http://hadoop.apache.org/common/docs/current/mapred-default.html
>>
>> Thanks,
>>
>> Tom
>>
>> On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis <[email protected]>
>> wrote:
>>> The map tasks fail timing out after 600 sec.
>>> I am processing one 9 GB file with 16,000,000 records. Each record (think
>>> is it as a line)  generates hundreds of key value pairs.
>>> The job is unusual in that the output of the mapper in terms of records
>> or
>>> bytes orders of magnitude larger than the input.
>>> I have no idea what is slowing down the job except that the problem is in
>>> the writes.
>>>
>>> If I change the job to merely bypass a fraction of the context.write
>>> statements the job succeeds.
>>> This is one map task that failed and one that succeeded - I cannot
>>> understand how a write can take so long
>>> or what else the mapper might be doing
>>>
>>> JOB FAILED WITH TIMEOUT
>>>
>>> *Parser*TotalProteins90,103NumberFragments10,933,089
>>>
>> *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807
>>> *Map-Reduce Framework*Combine output records10,033,499Map input records
>>> 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine input
>>> records10,844,881Map output records10,933,089
>>> Same code but fewer writes
>>> JOB SUCCEEDED
>>>
>>> *Parser*TotalProteins90,103NumberFragments206,658,758
>>> *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607
>>> FILE_BYTES_WRITTEN220,169,922
>>> *Map-Reduce Framework*Combine output records4,046,128Map input
>>> records90,103Spilled
>>> Records4,046,128Map output bytes662,354,413Combine input
>> records4,098,609Map
>>> output records2,066,588
>>> Any bright ideas
>>> --
>>> Steven M. Lewis PhD
>>> 4221 105th Ave NE
>>> Kirkland, WA 98033
>>> 206-384-1340 (cell)
>>> Skype lordjoe_com
>>
>
>
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com

Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec

Reply via email to