[
https://issues.apache.org/jira/browse/BEAM-11735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Claudio Venturini updated BEAM-11735:
-------------------------------------
Description:
Log messages emitted by any DoFn is not logged by spark executors when the
pipeline is run with Spark in cluster deployment mode (on YARN). Tested on
Cloudera 6 with Spark 2.4.
I made a test project to reproduce the issue:
[https://github.com/ventuc/beam-log-test]. Run it with:
{{spark-submit --class beam.tests.log.LogTesting --name LogTesting
--deploy-mode cluster --master yarn --conf
"spark.driver.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
--conf
"spark.executor.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
--files $HOME/log4j.properties beam-log-test-0.0.1-SNAPSHOT.jar}}
To retrieve logs from YARN run:
{{yarn logs -applicationId <app_id>}}
As you can see, logs from the beam.tests.log appear only in the driver's log,
and not in the executor's log.
There's not any documentation about how to handle logs in Beam with the Spark
runner. Please document it as requested also by BEAM-792.
was:
Log messages emitted by any DoFn is not logged by spark executors when the
pipeline is run with Spark in cluster deployment mode (on YARN). Tested on
Cloudera 6 with Spark 2.4.
I made a test project to reproduce the issue:
[https://github.com/ventuc/beam-log-test]. Run it with:
{{spark-submit --class beam.tests.log.LogTesting --name LogTesting
--deploy-mode cluster --master yarn --conf
"spark.driver.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
--conf
"spark.executor.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
--files $HOME/log4j.properties beam-log-test-0.0.1-SNAPSHOT.jar}}
To retrieve logs from YARN run:
{{yarn logs -applicationId <app_id>}}
As you can see, logs from the beam.tests.log appear only in the driver's log,
and not in the executor's log.
There's not any documentation about how to handle logs in Beam with the Spark
runner. Please document it as requested also by
[BEAM-792|https://issues.apache.org/jira/browse/BEAM-792]
> Logging from DoFn doesn't work with Spark Runner in cluster mode
> ----------------------------------------------------------------
>
> Key: BEAM-11735
> URL: https://issues.apache.org/jira/browse/BEAM-11735
> Project: Beam
> Issue Type: Bug
> Components: runner-spark, sdk-java-core
> Affects Versions: 2.26.0, 2.27.0
> Environment: Cloudera 6, Hadoop 3, Spark 2.4
> Reporter: Claudio Venturini
> Priority: P1
> Labels: SLF4J, log-aggregation, log4j, logging,, spark
>
> Log messages emitted by any DoFn is not logged by spark executors when the
> pipeline is run with Spark in cluster deployment mode (on YARN). Tested on
> Cloudera 6 with Spark 2.4.
> I made a test project to reproduce the issue:
> [https://github.com/ventuc/beam-log-test]. Run it with:
> {{spark-submit --class beam.tests.log.LogTesting --name LogTesting
> --deploy-mode cluster --master yarn --conf
> "spark.driver.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
> --conf
> "spark.executor.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
> --files $HOME/log4j.properties beam-log-test-0.0.1-SNAPSHOT.jar}}
> To retrieve logs from YARN run:
> {{yarn logs -applicationId <app_id>}}
> As you can see, logs from the beam.tests.log appear only in the driver's log,
> and not in the executor's log.
>
> There's not any documentation about how to handle logs in Beam with the Spark
> runner. Please document it as requested also by BEAM-792.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)