Luis Ramos commented on SPARK-636:

I feel like the broadcasting mechanism doesn't get me "close" enough to solve 
my issue (initialization of a logging system). That's partly because 
initialization would be deferred (meaning a loss of useful logs), and also it 
could enable us to have init code that is 'guaranteed' to only be executed once 
as opposed to implement that 'guarantee' yourself, which currently can lead to 
bad practices.

> Add mechanism to run system management/configuration tasks on all workers
> -------------------------------------------------------------------------
>                 Key: SPARK-636
>                 URL: https://issues.apache.org/jira/browse/SPARK-636
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Josh Rosen
> It would be useful to have a mechanism to run a task on all workers in order 
> to perform system management tasks, such as purging caches or changing system 
> properties.  This is useful for automated experiments and benchmarking; I 
> don't envision this being used for heavy computation.
> Right now, I can mimic this with something like
> {code}
> sc.parallelize(0 until numMachines, numMachines).foreach { } 
> {code}
> but this does not guarantee that every worker runs a task and requires my 
> user code to know the number of workers.
> One sample use case is setup and teardown for benchmark tests.  For example, 
> I might want to drop cached RDDs, purge shuffle data, and call 
> {{System.gc()}} between test runs.  It makes sense to incorporate some of 
> this functionality, such as dropping cached RDDs, into Spark itself, but it 
> might be helpful to have a general mechanism for running ad-hoc tasks like 
> {{System.gc()}}.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to