[ 
https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110506#comment-16110506
 ] 

Sean Owen commented on SPARK-650:
---------------------------------

Are you looking for an example of how it works? something like this, for what I 
assume is the common case of something like initializing a connection to an 
external resource:

{code}
val config = ...
df.mapPartitions { it =>
  MyResource.initIfNeeded(config)
  it.map(...)
}

...

object MyResource {
  private var initted = false
  def initIfNeeded(config: Config): Unit = this.synchronized {
    if (!initted) {
      initializeResource(config)
      initted = true
  }
}
{code}

If config is big, or tricky to pass around, that too can be read directly from 
a location, or wrapped up in some object in your code. It can actually be:

{code}
df.mapPartitions { it =>
  MyResource.initIfNeeded()
  it.map(...)
}

...

object MyResource {
  private var initted = false
  def initIfNeeded(): Unit = this.synchronized {
    if (!initted) {
      val config = getConf()
      initializeResource(config)
      initted = true
  }
}
{code}

You get the idea. This is not a special technique, not even really singletons. 
Just making a method that executes the first time it's called and then does 
nothing after. 
If you don't like having to call initResource -- call that in whatever code 
produces the resource connection or whatever.

We can imagine objections and answers like this all day I'm sure. I think it 
covers all use cases I can imagine that a setup hook does, so the question is 
just is it easy enough? You're saying it's unusably hard, and proposing some 
hack on the serializer that sounds much more error-prone. I just cannot agree 
with this. This is much simpler than other solutions people are arguing against 
here, which I also think are too complex. Was it just a misunderstanding of the 
proposal?

[[email protected]] have you considered the implications of the 
semantics of a setup hook? for example, if setup fails on an executor, can you 
schedule a task that needed it? how do you track that? Here, the semantics are 
obvious.

> Add a "setup hook" API for running initialization code on each executor
> -----------------------------------------------------------------------
>
>                 Key: SPARK-650
>                 URL: https://issues.apache.org/jira/browse/SPARK-650
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>            Priority: Minor
>
> Would be useful to configure things like reporting libraries



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to