[
https://issues.apache.org/jira/browse/AIRFLOW-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16087143#comment-16087143
]
ASF subversion and git services commented on AIRFLOW-1255:
----------------------------------------------------------
Commit 28aeed4aa62acb295eda1cf94a40f7f643b650fb in incubator-airflow's branch
refs/heads/master from [~ajs]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=28aeed4 ]
[AIRFLOW-1255] Fix SparkSubmitHook output deadlock
Refactor the SparkSubmitHook output processing to
avoid a deadlock.
The prior implementation deadlocks if the stderr
pipe buffer fills:
1. Airflow tries to drain stdout before starting
on stderr;
2. Spark gets suspended if the stderr pipe fills;
3. Both processes are now waiting for something
that will never happen.
Closes #2438 from asnare/fix/spark-submit
> SparkSubmitOperator logs do not stream correctly
> ------------------------------------------------
>
> Key: AIRFLOW-1255
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1255
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks, operators
> Affects Versions: Airflow 1.8
> Environment: Spark 1.6.0 with Yarn cluster
> Airflow 1.8
> Reporter: Himanshu Jain
> Priority: Minor
> Labels: easyfix
> Fix For: 1.8.3
>
>
> Logging in SparkSubmitOperator does not work as intended (continuous logging
> as received in the subprocess). This is because, spark-submit internally
> redirects all logs to stdout (including stderr), which causes the current two
> iterator logging to get stuck with empty stderr pipe. The logs are written
> only when the subprocess finishes. This leads to yarn_application_id not
> being available until the end of application.
> Specifically,
> {code:title= spark_submit_hook.py (lines 217-220)|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
> stdout=subprocess.PIPE,
> stderr=subprocess.PIPE,
> **kwargs)
> {code}
> needs to be changed to
> {code:title= spark_submit_hook.py|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
> stdout=subprocess.PIPE,
> **kwargs)
> {code}
> with subsequent changes in the following lines.
> I have not tested whether the issue exists with spark 2 versions as well or
> not.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)