[
https://issues.apache.org/jira/browse/SPARK-14880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmed Mahran updated SPARK-14880:
---------------------------------
External issue URL:
https://github.com/mashin-io/rich-spark/blob/master/main/src/main/scala/org/apache/spark/mllib/optimization/ParallelSGD.scala
(was:
https://github.com/mashin-io/rich-spark/blob/master/src/main/scala/org/apache/spark/mllib/optimization/ParallelSGD.scala)
> Parallel Gradient Descent with less map-reduce shuffle overhead
> ---------------------------------------------------------------
>
> Key: SPARK-14880
> URL: https://issues.apache.org/jira/browse/SPARK-14880
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Reporter: Ahmed Mahran
> Labels: performance
>
> The current implementation of (Stochastic) Gradient Descent performs one
> map-reduce shuffle per iteration. Moreover, when the sampling fraction gets
> smaller, the algorithm becomes shuffle-bound instead of CPU-bound.
> {code}
> (1 to numIterations or convergence) {
> rdd
> .sample(fraction)
> .map(Gradient)
> .reduce(Update)
> }
> {code}
> A more performant variation requires only one map-reduce regardless from the
> number of iterations. A local mini-batch SGD could be run on each partition,
> then the results could be averaged. This is based on (Zinkevich, Martin,
> Markus Weimer, Lihong Li, and Alex J. Smola. "Parallelized stochastic
> gradient descent." In Advances in neural information processing systems,
> 2010,
> http://www.research.rutgers.edu/~lihong/pub/Zinkevich11Parallelized.pdf).
> {code}
> rdd
> .shuffle()
> .mapPartitions((1 to numIterations or convergence) {
> iter.sample(fraction).map(Gradient).reduce(Update)
> })
> .reduce(Average)
> {code}
> A higher level iteration could enclose the above variation; shuffling the
> data before the local mini-batches and feeding back the average weights from
> the last iteration. This allows more variability in the sampling of the
> mini-batches with the possibility to cover the whole dataset. Here is a Spark
> based implementation
> https://github.com/mashin-io/rich-spark/blob/master/src/main/scala/org/apache/spark/mllib/optimization/ParallelSGD.scala
> {code}
> (1 to numIterations1 or convergence) {
> rdd
> .shuffle()
> .mapPartitions((1 to numIterations2 or convergence) {
> iter.sample(fraction).map(Gradient).reduce(Update)
> })
> .reduce(Average)
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]