Tom Howland created SPARK-34033:
-----------------------------------
Summary: SparkR Daemon Initialization
Key: SPARK-34033
URL: https://issues.apache.org/jira/browse/SPARK-34033
Project: Spark
Issue Type: Improvement
Components: R, SparkR
Affects Versions: 3.2.0
Environment: tested on centos 7 & spark 2.3.1 and on my mac & spark at
master
Reporter: Tom Howland
Provide a way for users to initialize the sparkR daemon before it forks.
Described in detail in
[docs/sparkr.md|https://github.com/WamBamBoozle/spark/blob/daemon_init/docs/sparkr.md#daemon-initialization]
I'm a contractor to Target, where we have several projects doing ML with
sparkR. The changes proposed here results in weeks of compute-time saved with
every run.
(40000 partitions) * (5 seconds to load our R libraries) * (2 calls to gapply
in our app) / 60 / 60 = 111 hours.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]