Michael Schmei├čer commented on SPARK-650:

I agree that static initialization would solve the problem for cases where 
everything is known or can be loaded at class-loading time, e.g. from property 
files in the artifact itself.

For situations like RecordReaders, it might also work, because they have an 
initialize method where they get contextual information that could have been 
enriched with the required values from the driver.

However, we also have other cases, where information from the driver is needed. 
Imagine the following case: We have a temporary directory in HDFS which is 
determined by the Oozie workflow instance ID. The driver knows this 
information, because it is provided by Oozie via main method arguments. The 
executor needs this information as well, e.g. to load some data that is 
required to initialize a static context. Then, the question arises: How does 
the information get to the executor?

Either with the function instance which would mean that the developer of the 
function needs to know that he has to call an initialization method in every 
function or at least in every first function on an RDD (which he probably 
doesn't know, because he received the RDD from a different part of the 
application). Or with an explicit mechanism which is executed before the 
developer functions run on any executor. Which would lead me again to the 
"empty RDD" workaround.

> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>                 Key: SPARK-650
>                 URL: https://issues.apache.org/jira/browse/SPARK-650
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
> Would be useful to configure things like reporting libraries

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to