Yes. That is feasible.

I think that you would have better luck with something like asynchronous
SGD as described here:

   http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2012_0598.pdf

and here

   http://www.cs.toronto.edu/~fritz/absps/georgerectified.pdf

It would also be good to consider looking at some of the new scala work in
Mahout.  Map-reduce is a difficult medium for this art.




On Fri, Mar 28, 2014 at 5:21 AM, Li Li <[email protected]> wrote:

> I have read "Parallelized stochastic gradient descent" (2010) by
> Martin A. Zinkevich et al.
> the parallel sgd is very simple:
>
> Define T = ⌊m/k⌋
> Randomly partition the examples, giving T examples to each machine.
> for all i ∈ {1, . . . k} parallel do
>     Randomly shuffle the data on machine i.
>     Initialize wi,0 = 0.
>     for all t ∈ {1, . . . T }: do
>          Get the tth example on the ith machine (this machine), ci,t
>          wi,t ← wi,t−1 − η∂w ci (wi,t−1 )
>      end for
> end for
> Aggregate from all computers v = k i=1 wi,t and return v.
>
> it assumes that each machine do sgd optimization on the data locally
> and randomly shuffle the data on this machine.
>
> it seems each machine has to load all the local data into memory and
> shuffle to perform sgd
> then average them
>
> how to do this in hadoop?
>
> 1. how to control hadoop input split size .
>       let hadoop do this for me? but each split should be not too much
> that can't be loaded into memory
> 2. do batch?
>       in setUp of Mapper, construct a data structure to store all data
> of this split
>       int mapper, just add data to this data structure
>       int close method, do the real job of sgd
>
> is my method feasible?
>

Reply via email to