ibzib commented on a change in pull request #13743:
URL: https://github.com/apache/beam/pull/13743#discussion_r567279548
##########
File path:
runners/spark/src/main/java/org/apache/beam/runners/spark/SparkPipelineRunner.java
##########
@@ -123,10 +134,33 @@ public PortablePipelineResult run(RunnerApi.Pipeline
pipeline, JobInfo jobInfo)
"Will stage {} files. (Enable logging at DEBUG level to see which
files will be staged.)",
pipelineOptions.getFilesToStage().size());
LOG.debug("Staging files: {}", pipelineOptions.getFilesToStage());
-
PortablePipelineResult result;
final JavaSparkContext jsc =
SparkContextFactory.getSparkContext(pipelineOptions);
+ EventLoggingListener eventLoggingListener = null;
+ if (pipelineOptions.getEventLogEnabled()) {
+ eventLoggingListener =
+ new EventLoggingListener(
+ jobInfo.jobId(),
+ scala.Option.apply(jobInfo.jobName()),
+ new URI(pipelineOptions.getSparkHistoryDir()),
+ jsc.getConf(),
+ jsc.hadoopConfiguration());
+ eventLoggingListener.initializeLogIfNecessary(false, false);
+ eventLoggingListener.start();
+ scala.collection.immutable.Map<String, String> logUrlMap =
+ new scala.collection.immutable.HashMap<String, String>();
+ Tuple2<String, String>[] sparkMasters =
jsc.getConf().getAllWithPrefix("spark.master");
+ Tuple2<String, String>[] sparkExecutors =
jsc.getConf().getAllWithPrefix("spark.executor.id");
+ for (int i = 0; i < sparkMasters.length; i++) {
+ eventLoggingListener.onExecutorAdded(
+ new SparkListenerExecutorAdded(
+ Instant.now().getMillis(),
+ sparkExecutors[i]._2(),
Review comment:
This assumes the number of spark.master is the same as the number of
spark.executor.id. Which I don't think is a safe assumption.
In the Spark execution model, usually there is exactly one spark.master and
possibly many executors. I'm not sure if it's ever possible for there to be
multiple masters.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]