[ 
https://issues.apache.org/jira/browse/SPARK-7898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-7898.
------------------------------
    Resolution: Not A Problem

I think this is by design. You're saying that user code's output is all shunted 
to stdout, because Pyspark itself is using stderr for its own output that isn't 
user program output. I think that's sensible.

You would never want to rely on this behavior for your program. If you need to 
use the output of a binary, use a piped RDD or similar.

> pyspark merges stderr into stdout
> ---------------------------------
>
>                 Key: SPARK-7898
>                 URL: https://issues.apache.org/jira/browse/SPARK-7898
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.3.0
>            Reporter: Sam Steingold
>
> When I type 
> {code}
> hadoop fs -text /foo/bar/baz.bz2 2>err 1>out
> {code}
> I get two non-empty files: {{err}} with 
> {code}
> 2015-05-26 15:33:49,786 INFO  [main] bzip2.Bzip2Factory 
> (Bzip2Factory.java:isNativeBzip2Loaded(70)) - Successfully loaded & 
> initialized native-bzip2 library system-native
> 2015-05-26 15:33:49,789 INFO  [main] compress.CodecPool 
> (CodecPool.java:getDecompressor(179)) - Got brand-new decompressor [.bz2]
> {code}
> and {{out}} with the content of the file (as expected).
> When I call the same command from Python (2.6):
> {code}
> from subprocess import Popen
> with open("out","w") as out:
>     with open("err","w") as err:
>         p = Popen(['hadoop','fs','-text',"/foo/bar/baz.bz2"],
>                   stdin=None,stdout=out,stderr=err)
> print p.wait()
> {code}
> I get the exact same (correct) behavior.
> *However*, when I run the same code under *PySpark* (or using 
> {{spark-submit}}), I get an *empty* {{err}} file and the {{out}} file starts 
> with the log messages above (and then it contains the actual data).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to