Claudio Venturini created BEAM-11735:
----------------------------------------
Summary: Loggin from DoFn doesn't work with Spark Runner in
cluster mode
Key: BEAM-11735
URL: https://issues.apache.org/jira/browse/BEAM-11735
Project: Beam
Issue Type: Bug
Components: runner-spark, sdk-java-core
Affects Versions: 2.27.0, 2.26.0
Environment: Cloudera 6, Hadoop 3, Spark 2.4
Reporter: Claudio Venturini
Log messages emitted by any DoFn is not logged by spark executors when the
pipeline is run with Spark in cluster deployment mode (on YARN). Tested on
Cloudera 6 with Spark 2.4.
I made a test project to reproduce the issue:
https://github.com/ventuc/beam-log-test. Run it with:
{{spark-submit --class beam.tests.log.LogTesting --name LogTesting
--deploy-mode cluster --master yarn --conf
"spark.driver.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
--conf
"spark.executor.extraJavaOptions=-Dlog4j.configuration=[file:log4j.properties|file://log4j.properties/]"
--files $HOME/log4j.properties beam-log-test-0.0.1-SNAPSHOT.jar}}
To retrieve logs from YARN run:
{{yarn logs -applicationId <app_id>}}
As you can see, logs from the beam.tests.log appear only in the driver's log,
and not in the executor's log.
There's not any documentation about how to handle logs in Beam with the Spark
runner. Please document it as requested also by
https://issues.apache.org/jira/browse/BEAM-792
--
This message was sent by Atlassian Jira
(v8.3.4#803005)