Does the syslog output from a should-have-failed task contain
something like this?

    java.lang.RuntimeException: PipeMapRed.waitOutputThreads():
subprocess failed with code 1

(In particular, I'm curious if it mentions the RuntimeException.)

Tasks that consume all their input and then exit non-zero are
definitely supposed to be counted as failed, so there's either a
problem with the setup or a bug somewhere.

rick

On Wed, May 14, 2008 at 8:49 PM, Andrey Pankov <[EMAIL PROTECTED]> wrote:
> Hi,
>
>  I've tested this new option "-jobconf
> stream.non.zero.exit.status.is.failure=true". Seems working but still not
> good for me. When mapper/reducer program have read all input data
> successfully and fails after that, streaming still finishes successfully so
> there are no chances to know about some data post-processing errors in
> subprocesses :(
>
>
>
>  Andrey Pankov wrote:
>
> > Hi Rick,
> >
> > Thank you for the quick response! I see this feature is in trunk and not
> available in last stable release. Anyway will try if it works for me from
> the trunk, and will try does it catch segmentation faults too.
> >
> >
> > Rick Cox wrote:
> >
> > > Try "-jobconf stream.non.zero.exit.status.is.failure=true".
> > >
> > > That will tell streaming that a non-zero exit is a task failure. To
> > > turn that into an immediate whole job failure, I think configuring 0
> > > task retries (mapred.map.max.attempts=1 and
> > > mapred.reduce.max.attempts=1) will be sufficient.
> > >
> > > rick
> > >
> > > On Tue, May 13, 2008 at 8:15 PM, Andrey Pankov <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > >  I'm looking a way to force Streaming to shutdown the whole job in
> case when
> > > > some of its subprocesses exits with non-zero error code.
> > > >
> > > >  We have next situation. Sometimes either mapper or reducer could
> crush, as
> > > > a rule it returns some exit code. In this case entire streaming job
> finishes
> > > > successfully, but that's wrong. Almost the same when any subprocess
> finishes
> > > > with segmentation fault.
> > > >
> > > >  It's possible to check automatically if a subprocess crushed only via
> logs
> > > > but it means you need to parse tons of outputs/logs/dirs/etc.
> > > >  In order to find logs of your job you have to know it's jobid ~
> > > > job_200805130853_0016. I don't know easy way to determine it - just
> scan
> > > > stdout for the pattern. Then find logs of each mapper, each reducer,
> find a
> > > > way to parse them, etc, etc...
> > > >
> > > >  So, is there any easiest way get correct status of the whole
> streaming job
> > > > or I still have to build rather fragile parsing systems for such
> purposes?
> > > >
> > > >  Thanks in advance.
> > > >
> > > >  --
> > > >  Andrey Pankov
> > > >
> > > >
> > > >
> > >
> > >
> >
> >
> >
>
>
>  --
>  Andrey Pankov
>
>

Reply via email to