[
https://issues.apache.org/jira/browse/AIRFLOW-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bolke de Bruin resolved AIRFLOW-1255.
-------------------------------------
Resolution: Fixed
Fix Version/s: 1.8.3
Issue resolved by pull request #2438
[https://github.com/apache/incubator-airflow/pull/2438]
> SparkSubmitOperator logs do not stream correctly
> ------------------------------------------------
>
> Key: AIRFLOW-1255
> URL: https://issues.apache.org/jira/browse/AIRFLOW-1255
> Project: Apache Airflow
> Issue Type: Bug
> Components: hooks, operators
> Affects Versions: Airflow 1.8
> Environment: Spark 1.6.0 with Yarn cluster
> Airflow 1.8
> Reporter: Himanshu Jain
> Priority: Minor
> Labels: easyfix
> Fix For: 1.8.3
>
>
> Logging in SparkSubmitOperator does not work as intended (continuous logging
> as received in the subprocess). This is because, spark-submit internally
> redirects all logs to stdout (including stderr), which causes the current two
> iterator logging to get stuck with empty stderr pipe. The logs are written
> only when the subprocess finishes. This leads to yarn_application_id not
> being available until the end of application.
> Specifically,
> {code:title= spark_submit_hook.py (lines 217-220)|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
> stdout=subprocess.PIPE,
> stderr=subprocess.PIPE,
> **kwargs)
> {code}
> needs to be changed to
> {code:title= spark_submit_hook.py|borderStyle=solid}
> self._sp = subprocess.Popen(spark_submit_cmd,
> stdout=subprocess.PIPE,
> **kwargs)
> {code}
> with subsequent changes in the following lines.
> I have not tested whether the issue exists with spark 2 versions as well or
> not.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)