Thanks for the link, Chris. Haven’t made it through everything (the proof technique is pretty advanced), but there do seem to be a few significant limitations for practical applications. Two I noticed are:
(1) Assumption that data is present across machines in a randomized order. Definitely not realistic on a standard Hadoop installation. (2) Assumption that both the cost function and its gradient are strictly bounded. Not true for lots of interesting models, including linear regression. Still very cool work (as to be expected given the authors). — John On Jun 9, 2014, at 10:53 AM, Christopher Fusting <[email protected]> wrote: > The results in the paper from which this algorithm was implemented are > encouraging: > http://www.research.rutgers.edu/~lihong/pub/Zinkevich11Parallelized.pdf > > The proof is a bit beyond me so I cannot vouch for the theory. I'm excited > to test this on some non - trivial problems to see how it fares. > > _Chris > > On Monday, June 9, 2014 12:19:39 PM UTC-4, John Myles White wrote: > My question is about the theory behind your algorithm. My understanding is > that no parallel SGD implementation (except one that trivially runs on the > same data) will produce correct results in general. Is that not true? > > -- John > > On Jun 9, 2014, at 9:07 AM, Christopher Fusting <[email protected]> wrote: > >> John, >> >> There has been no rigorous testing yet. My primary concern in the averaging >> algorithm is process latency, completion time, and faults. Do you have >> specifics you would like to share? >> >> _Chris >> >> >> On Mon, Jun 9, 2014 at 11:24 AM, John Myles White <[email protected]> >> wrote: >> Very cool, Chris. >> >> I’ve done a lot of work on SGD in Julia, so I’m glad to see more. >> >> Regarding the averaging technique you’re using, have you done much testing >> to see how well it works? My sense is that the algorithm you’re using is a >> little brittle, but perhaps I’ve misunderstood it. >> >> — John >> >> On Jun 8, 2014, at 11:36 AM, Christopher Fusting <[email protected]> wrote: >> >>> Hi everyone. I've been playing around with Julia for awhile now and have >>> implemented Parallel Stochastic Gradient Descent. This is my first Julia >>> project (and attempt at implementing this algorithm) so its not perfect, >>> but I think I have a good start and wanted to share it: >>> https://github.com/cfusting/PSGD.jl. I welcome any feedback. >>> >>> Eventually I'd like to integrate the package with DataFrames and do a >>> little optimization, especially on the algorithm that partition the data. >>> >>> _Chris >> >> >> >> >> -- >> Christopher W. Fusting >> Software Developer / Analyst >> >> @cfusting >> 828-772-0012 >
