[ 
https://issues.apache.org/jira/browse/PIG-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-4976:
------------------------------
    Attachment: PIG-4976-5-knoguchi.patch

(I) My previous patch {{PIG-4976-3.patch}} handled both cases by relying on the 
exceptions thrown inside the close().  However, as Nandor pointed out, 
exception from (1) 
"Error while reading from POStream and passing it to thes
java.io.FileNotFoundException: foo (No such file or directory)"
was misleading in that it's checking the output file "foo" AFTER it saw 
exitcode of 255.  We probably shouldn't be checking the output file in the 
first place.

Here, I've added a line to signal error when exitcode != 0 inside close() and 
skipped the checking of output file.
I still need the outer catch block to signal error for the case in (2). 

(II) As for NullPointerException from killProcess, we probably should get rid 
of it. 

Added extra null check for OutputHandler.java and also made killprocess to 
ignore any Exceptions for inputhandler.close() and outputhandler.close(). 

(III) Modified the test case to handle small and big inputs.  In my 
environment, they went through the two different paths, (1) and (2). 

For (1), it'll show 
{noformat}
2016-09-20 17:45:38,630 WARN  [Thread-263] mapred.LocalJobRunner 
(LocalJobRunner.java:run(560)) - job_local2000154943_0012
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: 
ERROR 2055: Received Error while processing the map plan: 'perl script212
937190666196237pl 
(stdin-org.apache.pig.builtin.PigStreaming/foo-org.apache.pig.builtin.PigStreaming())'
 failed with exit status: 255
    at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
{noformat}
whereas for (2), it'll only show
{noformat}
2016-09-20 17:45:39,304 WARN  [Thread-320] streaming.ExecutableManager 
(ExecutableManager.java:run(340)) - Exception while trying to write to stream
 binary's input
java.io.IOException: Stream closed
    at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
    at java.io.OutputStream.write(OutputStream.java:116)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at java.io.DataOutputStream.write(DataOutputStream.java:107)
    at org.apache.pig.impl.streaming.InputHandler.putNext(InputHandler.java:72)
    at 
org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run(ExecutableManager.java:335)
2016-09-20 17:45:39,305 ERROR [Thread-320] streaming.ExecutableManager 
(ExecutableManager.java:run(369)) - Error while reading from POStream and pas
sing it to the streaming process:
java.io.IOException: Stream closed
    at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
    at java.io.OutputStream.write(OutputStream.java:116)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
    at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
    at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
    at java.io.DataOutputStream.flush(DataOutputStream.java:123)
    at org.apache.pig.impl.streaming.InputHandler.close(InputHandler.java:93)
    at 
org.apache.pig.impl.streaming.DefaultInputHandler.close(DefaultInputHandler.java:50)
    at 
org.apache.pig.impl.streaming.ExecutableManager.close(ExecutableManager.java:114)
    at 
org.apache.pig.backend.hadoop.streaming.HadoopExecutableManager.close(HadoopExecutableManager.java:131)
    at 
org.apache.pig.impl.streaming.ExecutableManager$ProcessInputThread.run(ExecutableManager.java:352)
2016-09-20 17:45:39,306 INFO  [Thread-293] mapred.LocalJobRunner 
(LocalJobRunner.java:runTasks(456)) - map task executor complete.
2016-09-20 17:45:39,309 WARN  [Thread-293] mapred.LocalJobRunner 
(LocalJobRunner.java:run(560)) - job_local331634717_0013
java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: 
ERROR 2055: Received Error while processing the map plan: Error while rea
ding from POStream and passing it to the streaming process:Stream closed
    at 
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
{noformat}

In both cases, stderr will show the syntax errors.   
{panel}
Can't locate object method "syntax" via package "error" (perhaps you forgot to 
load "error"?) at script4635494469399418013pl line 2.
{panel}

> streaming job with store clause stuck if the script fail
> --------------------------------------------------------
>
>                 Key: PIG-4976
>                 URL: https://issues.apache.org/jira/browse/PIG-4976
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.17.0
>
>         Attachments: PIG-4976-1.patch, PIG-4976-2.patch, PIG-4976-3.patch, 
> PIG-4976-4.patch, PIG-4976-5-knoguchi.patch
>
>
> When investigating PIG-4972, I also notice Pig job stuck when the perl script 
> have syntax error. This happens if we have output clause in stream 
> specification (means use a file as staging). The bug exist in both Tez and 
> MR, and it is not a regression.
> Here is an example:
> {code}
> define CMD `perl kk.pl` output('foo') ship('kk.pl');
> A = load 'studenttab10k' as (name, age, gpa);
> B = foreach A generate name;
> C = stream B through CMD;
> store C into 'ooo';
> {code}
> kk.pl is any perl script contain a syntax error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to