[
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15580055#comment-15580055
]
Michael Schmeißer commented on SPARK-650:
-----------------------------------------
Ok, let me explain the specific problems that we have encountered, which might
help to understand the issue and possible solutions:
We need to run some code on the executors before anything gets processed, e.g.
initialization of the log system or context setup. To do this, we need
information that is present on the driver, but not on the executors. Our
current solution is to provide a base class for Spark function implementations
which contains the information from the driver and initializes everything in
its readObject method. Since multiple narrow-dependent functions may be
executed on the same executor JVM subsequently, this class needs to make sure
that initialization doesn't run multiple times. Sure, that's not hard to do,
but if you mix setup and cleanup logic for functions, partitions and/or the JVM
itself, it can get quite confusing without explicit hooks.
So, our solution basically works, but with that approach, you can't use lambdas
for Spark functions, which is quite inconvenient, especially for simple map
operations. Even worse, if you use a lambda or otherwise forget to extend the
required base class, the initialization doesn't occur and very weird exceptions
follow, depending on which resource your function tries to access during its
execution. Or if you have very bad luck, no exception will occur, but the log
messages will get logged to an incorrect destination. It's very hard to prevent
such cases without an explicit initialization mechanism and in a team with
several developers, you can't expect everyone to know what is going on there.
> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Matei Zaharia
> Priority: Minor
>
> Would be useful to configure things like reporting libraries
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]