Hi Rick,
Double checked my test. The syslog output contains msg about non-zero
exit code (in this case mapper finished with segfault)
2008-05-14 18:12:04,473 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess exited with code 134 in
org.apache.hadoop.streaming.PipeMapRed
stderr contains message with dump or smth about segfault.
Reducer job also finished with error:
2008-05-14 20:28:34,128 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess exited with code 55 in
org.apache.hadoop.streaming.PipeMapRed
Hence entire job is successful
08/05/14 18:12:03 INFO streaming.StreamJob: map 0% reduce 0%
08/05/14 18:12:05 INFO streaming.StreamJob: map 100% reduce 0%
08/05/14 18:12:06 INFO streaming.StreamJob: map 100% reduce 100%
08/05/14 18:12:06 INFO streaming.StreamJob: Job complete:
job_200805131958_0020
08/05/14 18:12:06 INFO streaming.StreamJob: Output:
/user/hadoop/data1_result
Rick Cox wrote:
Hi,
Thanks: that message indicates the stream.non.zero.exit.is.failure
feature isn't enabled for this task; the log is just reporting the
exit status, but not raising the RuntimeException that it would if the
feature were turned on.
I've had problems getting this parameter through from the command line
before. If you've got access, you could try setting it in the
hadoop-site.xml instead (I think it should be the tasktrackers that
read that parameter).
(Sorry about the confusion here, we've been using that patch for so
long I had forgotten it wasn't yet released, and I'm not exactly sure
where we stand with these other bugs.)
rick
On Wed, May 14, 2008 at 11:05 PM, Andrey Pankov <[EMAIL PROTECTED]> wrote:
Hi Rick,
Double checked my test. The syslog output contains msg about non-zero exit
code (in this case mapper finished with segfault)
2008-05-14 18:12:04,473 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess exited with code 134 in
org.apache.hadoop.streaming.PipeMapRed
stderr contains message with dump or smth about segfault.
Reducer job also finished with error:
2008-05-14 20:28:34,128 INFO org.apache.hadoop.streaming.PipeMapRed:
PipeMapRed.waitOutputThreads(): subprocess exited with code 55 in
org.apache.hadoop.streaming.PipeMapRed
Hence entire job is successful
08/05/14 18:12:03 INFO streaming.StreamJob: map 0% reduce 0%
08/05/14 18:12:05 INFO streaming.StreamJob: map 100% reduce 0%
08/05/14 18:12:06 INFO streaming.StreamJob: map 100% reduce 100%
08/05/14 18:12:06 INFO streaming.StreamJob: Job complete:
job_200805131958_0020
08/05/14 18:12:06 INFO streaming.StreamJob: Output:
/user/hadoop/data1_result
Rick Cox wrote:
Does the syslog output from a should-have-failed task contain
something like this?
java.lang.RuntimeException: PipeMapRed.waitOutputThreads():
subprocess failed with code 1
(In particular, I'm curious if it mentions the RuntimeException.)
Tasks that consume all their input and then exit non-zero are
definitely supposed to be counted as failed, so there's either a
problem with the setup or a bug somewhere.
rick
On Wed, May 14, 2008 at 8:49 PM, Andrey Pankov <[EMAIL PROTECTED]>
wrote:
Hi,
I've tested this new option "-jobconf
stream.non.zero.exit.status.is.failure=true". Seems working but still
not
good for me. When mapper/reducer program have read all input data
successfully and fails after that, streaming still finishes successfully
so
there are no chances to know about some data post-processing errors in
subprocesses :(
Andrey Pankov wrote:
Hi Rick,
Thank you for the quick response! I see this feature is in trunk and
not
available in last stable release. Anyway will try if it works for me
from
the trunk, and will try does it catch segmentation faults too.
Rick Cox wrote:
Try "-jobconf stream.non.zero.exit.status.is.failure=true".
That will tell streaming that a non-zero exit is a task failure. To
turn that into an immediate whole job failure, I think configuring 0
task retries (mapred.map.max.attempts=1 and
mapred.reduce.max.attempts=1) will be sufficient.
rick
On Tue, May 13, 2008 at 8:15 PM, Andrey Pankov <[EMAIL PROTECTED]>
wrote:
Hi all,
I'm looking a way to force Streaming to shutdown the whole job in
case when
some of its subprocesses exits with non-zero error code.
We have next situation. Sometimes either mapper or reducer could
crush, as
a rule it returns some exit code. In this case entire streaming
job
finishes
successfully, but that's wrong. Almost the same when any
subprocess
finishes
with segmentation fault.
It's possible to check automatically if a subprocess crushed only
via
logs
but it means you need to parse tons of outputs/logs/dirs/etc.
In order to find logs of your job you have to know it's jobid ~
job_200805130853_0016. I don't know easy way to determine it -
just
scan
stdout for the pattern. Then find logs of each mapper, each
reducer,
find a
way to parse them, etc, etc...
So, is there any easiest way get correct status of the whole
streaming job
or I still have to build rather fragile parsing systems for such
purposes?
Thanks in advance.
--
Andrey Pankov
--
Andrey Pankov
--
Andrey Pankov
--
Andrey Pankov