[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15725542#comment-15725542
 ] 

Michael Schmeißer commented on SPARK-650:
-----------------------------------------

Sure it can be included in the closure and this was also our first solution to 
the problem. But if the application has many layers and you need the resource 
which requires info X to initialize often, it soon gets very inconvenient 
because you have to pass X around a lot and pollute your APIs.

Thus, our next solution was to create a base function class which takes X in 
its constructor and makes sure that the resource is initialized on the executor 
side if it wasn't before. The drawback of this solution is that the function 
developer can forget to extend the function base class and then he may or may 
not be able to access the resource depending on whether a function has run 
before on the executor which performed the initialization. This is really 
error-prone (actually led to errors) and even if done correctly, prevents 
lambdas from beeing used for functions.

As a result, we now use the "empty RDD" approach or piggy-back the Spark 
JavaSerializer. Both works fine and initializes the executor-side resource 
properly on all executors. So, from a function developer's point-of-view that's 
nice, but overall, the solution relies on Spark internals to work which is why 
I would rather have an explicit mechanism to perform such an initialization.

> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
>                 Key: SPARK-650
>                 URL: https://issues.apache.org/jira/browse/SPARK-650
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to