Yes. That is feasible. I think that you would have better luck with something like asynchronous SGD as described here:
http://machinelearning.wustl.edu/mlpapers/paper_files/NIPS2012_0598.pdf and here http://www.cs.toronto.edu/~fritz/absps/georgerectified.pdf It would also be good to consider looking at some of the new scala work in Mahout. Map-reduce is a difficult medium for this art. On Fri, Mar 28, 2014 at 5:21 AM, Li Li <[email protected]> wrote: > I have read "Parallelized stochastic gradient descent" (2010) by > Martin A. Zinkevich et al. > the parallel sgd is very simple: > > Define T = ⌊m/k⌋ > Randomly partition the examples, giving T examples to each machine. > for all i ∈ {1, . . . k} parallel do > Randomly shuffle the data on machine i. > Initialize wi,0 = 0. > for all t ∈ {1, . . . T }: do > Get the tth example on the ith machine (this machine), ci,t > wi,t ← wi,t−1 − η∂w ci (wi,t−1 ) > end for > end for > Aggregate from all computers v = k i=1 wi,t and return v. > > it assumes that each machine do sgd optimization on the data locally > and randomly shuffle the data on this machine. > > it seems each machine has to load all the local data into memory and > shuffle to perform sgd > then average them > > how to do this in hadoop? > > 1. how to control hadoop input split size . > let hadoop do this for me? but each split should be not too much > that can't be loaded into memory > 2. do batch? > in setUp of Mapper, construct a data structure to store all data > of this split > int mapper, just add data to this data structure > int close method, do the real job of sgd > > is my method feasible? >
