[jira] [Commented] (AIRFLOW-1255) SparkSubmitOperator logs do not stream correctly

ASF subversion and git services (JIRA) Fri, 14 Jul 2017 03:22:51 -0700

    [ 
https://issues.apache.org/jira/browse/AIRFLOW-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087143#comment-16087143
 ]


ASF subversion and git services commented on AIRFLOW-1255:
----------------------------------------------------------

Commit 28aeed4aa62acb295eda1cf94a40f7f643b650fb in incubator-airflow's branch 
refs/heads/master from [~ajs]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=28aeed4 ]

[AIRFLOW-1255] Fix SparkSubmitHook output deadlock

Refactor the SparkSubmitHook output processing to
avoid a deadlock.
The prior implementation deadlocks if the stderr
pipe buffer fills:

1. Airflow tries to drain stdout before starting
on stderr;
2. Spark gets suspended if the stderr pipe fills;
3. Both processes are now waiting for something
that will never happen.

Closes #2438 from asnare/fix/spark-submit


> SparkSubmitOperator logs do not stream correctly
> ------------------------------------------------
>
>                 Key: AIRFLOW-1255
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1255
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: hooks, operators
>    Affects Versions: Airflow 1.8
>         Environment: Spark 1.6.0 with Yarn cluster
> Airflow 1.8
>            Reporter: Himanshu Jain
>            Priority: Minor
>              Labels: easyfix
>             Fix For: 1.8.3
>
>
> Logging in SparkSubmitOperator does not work as intended (continuous logging 
> as received in the subprocess). This is because, spark-submit internally 
> redirects all logs to stdout (including stderr), which causes the current two 
> iterator logging to get stuck with empty stderr pipe. The logs are written 
> only when the subprocess finishes. This leads to yarn_application_id not 
> being available until the end of application.
>  Specifically,
> {code:title= spark_submit_hook.py (lines 217-220)|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
>                                 stdout=subprocess.PIPE,
>                                 stderr=subprocess.PIPE,
>                                 **kwargs)
> {code}
> needs to be changed to 
> {code:title= spark_submit_hook.py|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
>                                 stdout=subprocess.PIPE,
>                                 **kwargs)
> {code}
> with subsequent changes in the following lines.
> I have not tested whether the issue exists with spark 2 versions as well or 
> not.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (AIRFLOW-1255) SparkSubmitOperator logs do not stream correctly

Reply via email to