[
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110506#comment-16110506
]
Sean Owen commented on SPARK-650:
---------------------------------
Are you looking for an example of how it works? something like this, for what I
assume is the common case of something like initializing a connection to an
external resource:
{code}
val config = ...
df.mapPartitions { it =>
MyResource.initIfNeeded(config)
it.map(...)
}
...
object MyResource {
private var initted = false
def initIfNeeded(config: Config): Unit = this.synchronized {
if (!initted) {
initializeResource(config)
initted = true
}
}
{code}
If config is big, or tricky to pass around, that too can be read directly from
a location, or wrapped up in some object in your code. It can actually be:
{code}
df.mapPartitions { it =>
MyResource.initIfNeeded()
it.map(...)
}
...
object MyResource {
private var initted = false
def initIfNeeded(): Unit = this.synchronized {
if (!initted) {
val config = getConf()
initializeResource(config)
initted = true
}
}
{code}
You get the idea. This is not a special technique, not even really singletons.
Just making a method that executes the first time it's called and then does
nothing after.
If you don't like having to call initResource -- call that in whatever code
produces the resource connection or whatever.
We can imagine objections and answers like this all day I'm sure. I think it
covers all use cases I can imagine that a setup hook does, so the question is
just is it easy enough? You're saying it's unusably hard, and proposing some
hack on the serializer that sounds much more error-prone. I just cannot agree
with this. This is much simpler than other solutions people are arguing against
here, which I also think are too complex. Was it just a misunderstanding of the
proposal?
[[email protected]] have you considered the implications of the
semantics of a setup hook? for example, if setup fails on an executor, can you
schedule a task that needed it? how do you track that? Here, the semantics are
obvious.
> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
> Key: SPARK-650
> URL: https://issues.apache.org/jira/browse/SPARK-650
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Reporter: Matei Zaharia
> Priority: Minor
>
> Would be useful to configure things like reporting libraries
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]