miniBatchFraction uses RDD.sample to get the mini-batch, and sample
still needs to visit the elements one after another. So it is not
efficient if the task is not computation heavy and this is why
setMiniBatchFraction is marked as experimental. If we can detect that
the partition iterator is backed by an ArrayBuffer, maybe we can do a
skip iterator to skip elements. -Xiangrui

On Tue, Aug 26, 2014 at 8:15 AM, Ulanov, Alexander
<alexander.ula...@hp.com> wrote:
> Hi, RJ
>
> https://github.com/avulanov/spark/blob/neuralnetwork/mllib/src/main/scala/org/apache/spark/mllib/classification/NeuralNetwork.scala
>
> Unit tests are in the same branch.
>
> Alexander
>
> From: RJ Nowling [mailto:rnowl...@gmail.com]
> Sent: Tuesday, August 26, 2014 6:59 PM
> To: Ulanov, Alexander
> Cc: dev@spark.apache.org
> Subject: Re: Gradient descent and runMiniBatchSGD
>
> Hi Alexander,
>
> Can you post a link to the code?
>
> RJ
>
> On Tue, Aug 26, 2014 at 6:53 AM, Ulanov, Alexander 
> <alexander.ula...@hp.com<mailto:alexander.ula...@hp.com>> wrote:
> Hi,
>
> I've implemented back propagation algorithm using Gradient class and a simple 
> update using Updater class. Then I run the algorithm with mllib's 
> GradientDescent class. I have troubles in scaling out this implementation. I 
> thought that if I partition my data into the number of workers then 
> performance will increase, because each worker will run a step of gradient 
> descent on its partition of data. But this does not happen and each worker 
> seems to process all data (if miniBatchFraction == 1.0 as in mllib's logisic 
> regression implementation). For me, this doesn't make sense, because then 
> only single Worker will provide the same performance. Could someone elaborate 
> on this and correct me if I am wrong. How can I scale out the algorithm with 
> many Workers?
>
> Best regards, Alexander
>
>
>
> --
> em rnowl...@gmail.com<mailto:rnowl...@gmail.com>
> c 954.496.2314

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to