[ https://issues.apache.org/jira/browse/SPARK-4423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14313036#comment-14313036 ]
Josh Rosen commented on SPARK-4423: ----------------------------------- [~pwendell] It's also true for those operators, too; I thought that it might be worth highlighting {{foreach}} as a special case because I've seen that {{foreach(println)}} has been a source of confusion for several new users. Maybe we can sync with the training team to see if they have any insight here. > Improve foreach() documentation to avoid confusion between local- and > cluster-mode behavior > ------------------------------------------------------------------------------------------- > > Key: SPARK-4423 > URL: https://issues.apache.org/jira/browse/SPARK-4423 > Project: Spark > Issue Type: Improvement > Components: Documentation > Reporter: Josh Rosen > Assignee: Ilya Ganelin > > {{foreach}} seems to be a common source of confusion for new users: in > {{local}} mode, {{foreach}} can be used to update local variables on the > driver, but programs that do this will not work properly when executed on > clusters, since the {{foreach}} will update per-executor variables (note that > this _will_ work correctly for accumulators, but not for other types of > mutable objects). > Similarly, I've seen users become confused when {{.foreach(println)}} doesn't > print to the driver's standard output. > At a minimum, we should improve the documentation to warn users against > unsafe uses of {{foreach}} that won't work properly when transitioning from > local mode to a real cluster. > We might also consider changes to local mode so that its behavior more > closely matches the cluster modes; this will require some discussion, though, > since any change of behavior here would technically be a user-visible > backwards-incompatible change (I don't think that we made any explicit > guarantees about the current local-mode behavior, but someone might be > relying on the current implicit behavior). -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org