[
https://issues.apache.org/jira/browse/SPARK-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14206143#comment-14206143
]
Andrew Ash commented on SPARK-572:
----------------------------------
Static mutable variables are now a standard way of having code run on a
per-executor basis.
To run per-entry, you can use map(), for per-partition you can use
mapPartitions(), but for per-executor you need static variables or
initializers. If for example you want to open a connection to another data
storage system and write all of an executor's data into that system, a static
connection object is the common way to do that.
I would propose closing this ticket as "Won't Fix". Using this technique is
confusing, but prohibiting it is difficult and introduces additional roadblocks
to Spark power users.
cc [~rxin]
> Forbid update of static mutable variables
> -----------------------------------------
>
> Key: SPARK-572
> URL: https://issues.apache.org/jira/browse/SPARK-572
> Project: Spark
> Issue Type: Improvement
> Reporter: tjhunter
>
> Consider the following piece of code:
> <pre>
> object Foo {
> var xx = -1
> def main() {
> xx = 1
> val sc = new SparkContext(...)
> sc.broadcast(xx)
> sc.parallelize(0 to 10).map(i=>{ ... xx ...})
> }
> }
> </pre>
> Can you guess the value of xx? It is 1 when you use the local scheduler and
> -1 when you use the mesos scheduler. Given the complications, it should
> probably just be forbidden for now...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]