Hog Wild is intended for multi-core machines. 2 core doesn't much matter in this consideration.
And GPU's rarely help with these problems because it is so easy to wind up I/O bound. SGD learners are fast. On Fri, Mar 16, 2012 at 6:43 PM, Lance Norskog <[email protected]> wrote: > Is "Hog Wild" expected to be faster on 2 processors are on 20? If it > is intended for many-processor machines, that may be a useful > addition. These days 8 cores is at the knee of the price-performance > curve for low-end servers. > > Are there gradient descent algorithms suitable for OpenCL GPU coding? > GPU seems like a hole in the Mahout suite, and would be very sexy to > summer-of-code projects. > > On Fri, Mar 16, 2012 at 11:21 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > I meant specifically the MR stuff which is SSGD seems to be aimed at. > > On a single CPU restarts or even simple CAS updates are not a problem > > as in the paper you've mentioned. There's no extra cost associated > > with them. I think Mahout's single node online SGD is already > > SMP-parallelized (albeit it does so to figure best fit for reg rate on > > a validation subset). as far as i remember. That's different from > > parallelization suggested in wild hog algo, but as long as we believe > > the work needs to be done and loads all cpus, there's probably not > > much win to use this or that approach for smp programming which > > essentially produce same quality result without meaningful improvement > > margin. > > > > On Thu, Mar 15, 2012 at 7:54 PM, Hector Yee <[email protected]> > wrote: > >> Have you read the lock free hog wild paper? Just sgd with multiple > threads > >> and don't be afraid of memory stomps. It works faster than batch > >> On Mar 15, 2012 2:32 PM, "Dmitriy Lyubimov" <[email protected]> wrote: > >> > >>> We already discussed the paper before. In fact, i had exactly same > >>> idea for partitioning the factorization task (something the authors > >>> call "stratified" sgd ) with stochastic learners before i ever saw > >>> this paper. > >>> > >>> I personally lost interest in this approach even before i read the > >>> paper because the way i understood it at that time, it would have > >>> required at least as many MR restarts with data exchange as there's a > >>> degree of parallelism and consequently just as many data passes. In > >>> framework of Mahout it is also difficult because Mahout doesn't > >>> support blocking out of the box for its DRM format so an additional > >>> job may be required to pre-block the data the way they want to process > >>> it --or-- we have to run over 100% of it during each restart, instead > >>> of a fraction if it. > >>> > >>> All in all, my speculation was there were little chances that this > >>> approach would provide a win over ALS techniques with restarts that we > >>> currently already have with a mid to high degree of parallelization > >>> (say 50 way parallelization and on). > >>> > >>> But honestly i would be happy to be wrong because I did not understand > >>> some of the work or did not see some of the optimizations suggested. I > >>> would be especially happy if it could beat our current ALS WR with a > >>> meaningful margin on bigger data. > >>> > >>> -d > >>> > >>> On Sat, Jan 14, 2012 at 9:45 AM, Zeno Gantner <[email protected]> > >>> wrote: > >>> > Hi list, > >>> > > >>> > I was talking to Isabel Drost in December, and we talked about a nice > >>> > paper from last year's KDD conference that suggests a neat trick that > >>> > allows doing SGD for matrix factorization in parallel. > >>> > > >>> > She said this would be interesting for some of you here. > >>> > > >>> > Here is the paper: > >>> > http://www.mpi-inf.mpg.de/~rgemulla/publications/gemulla11dsgd.pdf > >>> > > >>> > Note that the authors themselves implemented it already in Hadoop. > >>> > > >>> > Maybe someone would like to pick this up. > >>> > > >>> > I am still trying to find my way around the Mahout/Taste source code, > >>> > so do not expect anything from me too soon ;-) > >>> > > >>> > Best regards, > >>> > Zeno > >>> > > > > -- > Lance Norskog > [email protected] >
