I have read "Parallelized stochastic gradient descent" (2010) by
Martin A. Zinkevich et al.
the parallel sgd is very simple:

Define T = ⌊m/k⌋
Randomly partition the examples, giving T examples to each machine.
for all i ∈ {1, . . . k} parallel do
    Randomly shuffle the data on machine i.
    Initialize wi,0 = 0.
    for all t ∈ {1, . . . T }: do
         Get the tth example on the ith machine (this machine), ci,t
         wi,t ← wi,t−1 − η∂w ci (wi,t−1 )
     end for
end for
Aggregate from all computers v = k i=1 wi,t and return v.

it assumes that each machine do sgd optimization on the data locally
and randomly shuffle the data on this machine.

it seems each machine has to load all the local data into memory and
shuffle to perform sgd
then average them

how to do this in hadoop?

1. how to control hadoop input split size .
      let hadoop do this for me? but each split should be not too much
that can't be loaded into memory
2. do batch?
      in setUp of Mapper, construct a data structure to store all data
of this split
      int mapper, just add data to this data structure
      int close method, do the real job of sgd

is my method feasible?

Reply via email to