[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15722170#comment-15722170 ]
Michael Schmeißer commented on SPARK-650: ----------------------------------------- A singleton is not really feasible if additional information is required which is known (or determined) by the driver and thus needs to be sent to the executors for the initialization to happen. In this case, the options are 1) use some side-channel that is "magically" inferred by the executor, 2) use an empty RDD, repartition it to the number of executors and run mapPartitions on it, 3) piggy-back the JavaSerializer to run the initialization before any function is called or 4) require every function which may need the resource to initialize it on its own. Each of these options has significant drawbacks in my opinion. While 4 sounds good for most cases, it has some cons which I've described earlier (my comment from Oct 16) and make it unfeasible for our use-case. Option 1 might be possible, but the data flow wouldn't be all that obvious. Right now, we go with a mix of option 2 and 3 (try to determine the number of executors and if you can't, hijack the serializer), but really, this is hacked and might break in future releases of Spark. > Add a "setup hook" API for running initialization code on each executor > ----------------------------------------------------------------------- > > Key: SPARK-650 > URL: https://issues.apache.org/jira/browse/SPARK-650 > Project: Spark > Issue Type: New Feature > Components: Spark Core > Reporter: Matei Zaharia > Priority: Minor > > Would be useful to configure things like reporting libraries -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org