[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580055#comment-15580055
 ] 

Michael Schmei├čer commented on SPARK-650:
-----------------------------------------

Ok, let me explain the specific problems that we have encountered, which might 
help to understand the issue and possible solutions:

We need to run some code on the executors before anything gets processed, e.g. 
initialization of the log system or context setup. To do this, we need 
information that is present on the driver, but not on the executors. Our 
current solution is to provide a base class for Spark function implementations 
which contains the information from the driver and initializes everything in 
its readObject method. Since multiple narrow-dependent functions may be 
executed on the same executor JVM subsequently, this class needs to make sure 
that initialization doesn't run multiple times. Sure, that's not hard to do, 
but if you mix setup and cleanup logic for functions, partitions and/or the JVM 
itself, it can get quite confusing without explicit hooks.

So, our solution basically works, but with that approach, you can't use lambdas 
for Spark functions, which is quite inconvenient, especially for simple map 
operations. Even worse, if you use a lambda or otherwise forget to extend the 
required base class, the initialization doesn't occur and very weird exceptions 
follow, depending on which resource your function tries to access during its 
execution. Or if you have very bad luck, no exception will occur, but the log 
messages will get logged to an incorrect destination. It's very hard to prevent 
such cases without an explicit initialization mechanism and in a team with 
several developers, you can't expect everyone to know what is going on there.

> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
>                 Key: SPARK-650
>                 URL: https://issues.apache.org/jira/browse/SPARK-650
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to