Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec

Steve Lewis Wed, 18 Jan 2012 16:01:08 -0800

It always fails with a task timeout and that error gives me very little
indication of where the error occurs. The one piece of data I have is that
if I only call context.write 1 in 100 times it does not time out suggesting
that it is not MY code that is timing out.
I could try to time the write statements and see if they get slow although
those might to something slow in another thread?? Or it might be in the
internal hadoop data handling code.


On Wed, Jan 18, 2012 at 3:51 PM, Alex Kozlov <[email protected]> wrote:

> Does it always fail at the same place?  Does the task log shows something
> unusual?
>
> On Wed, Jan 18, 2012 at 3:46 PM, Steve Lewis <[email protected]>
> wrote:
>
> > I KNOW is is a task timeout - what I do NOT know is WHY merely cutting
> the
> > number of writes causes it to go away. It seems to imply that some
> > context.write operation or something downstream from that is taking a
> huge
> > amount of time and that is all hadoop internal code - not mine so my
> > question is why should increasing the number and volume of wriotes cause
> a
> > task to time out
> >
> > On Wed, Jan 18, 2012 at 2:33 PM, Tom Melendez <[email protected]> wrote:
> >
> > > Sounds like mapred.task.timeout?  The default is 10 minutes.
> > >
> > > http://hadoop.apache.org/common/docs/current/mapred-default.html
> > >
> > > Thanks,
> > >
> > > Tom
> > >
> > > On Wed, Jan 18, 2012 at 2:05 PM, Steve Lewis <[email protected]>
> > > wrote:
> > > > The map tasks fail timing out after 600 sec.
> > > > I am processing one 9 GB file with 16,000,000 records. Each record
> > (think
> > > > is it as a line)  generates hundreds of key value pairs.
> > > > The job is unusual in that the output of the mapper in terms of
> records
> > > or
> > > > bytes orders of magnitude larger than the input.
> > > > I have no idea what is slowing down the job except that the problem
> is
> > in
> > > > the writes.
> > > >
> > > > If I change the job to merely bypass a fraction of the context.write
> > > > statements the job succeeds.
> > > > This is one map task that failed and one that succeeded - I cannot
> > > > understand how a write can take so long
> > > > or what else the mapper might be doing
> > > >
> > > > JOB FAILED WITH TIMEOUT
> > > >
> > > > *Parser*TotalProteins90,103NumberFragments10,933,089
> > > >
> > >
> >
> *FileSystemCounters*HDFS_BYTES_READ67,245,605FILE_BYTES_WRITTEN444,054,807
> > > > *Map-Reduce Framework*Combine output records10,033,499Map input
> records
> > > > 90,103Spilled Records10,032,836Map output bytes3,520,182,794Combine
> > input
> > > > records10,844,881Map output records10,933,089
> > > > Same code but fewer writes
> > > > JOB SUCCEEDED
> > > >
> > > > *Parser*TotalProteins90,103NumberFragments206,658,758
> > > >
> *FileSystemCounters*FILE_BYTES_READ111,578,253HDFS_BYTES_READ67,245,607
> > > > FILE_BYTES_WRITTEN220,169,922
> > > > *Map-Reduce Framework*Combine output records4,046,128Map input
> > > > records90,103Spilled
> > > > Records4,046,128Map output bytes662,354,413Combine input
> > > records4,098,609Map
> > > > output records2,066,588
> > > > Any bright ideas
> > > > --
> > > > Steven M. Lewis PhD
> > > > 4221 105th Ave NE
> > > > Kirkland, WA 98033
> > > > 206-384-1340 (cell)
> > > > Skype lordjoe_com
> > >
> >
> >
> >
> > --
> > Steven M. Lewis PhD
> > 4221 105th Ave NE
> > Kirkland, WA 98033
> > 206-384-1340 (cell)
> > Skype lordjoe_com
> >
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: I am trying to run a large job and it is consistently failing with timeout - nothing happens for 600 sec

Reply via email to