Hello everyone! Yannis and I have been discussing over the past few days about the implementation of Parallel SGD using the ideas presented in the HOGWILD! paper(https://arxiv.org/pdf/1106.5730.pdf). We also have at hand an implementation of the algorithm by the authors of the paper( http://i.stanford.edu/hazy/victor/Hogwild/) and a talk given by one of the authors about the algorithm (https://www.youtube.com/watch?v=l5JqUvTdZts) (the part relevant to the parallel algorithm starts ~26 min). Overall, we've been able to identify the fact that the algorithm (with its free for all update model and simple work sharing description) does not contain a lot of complexity. The paper describes a hypergraph structure on the loss function to be minimised. However, a sizeable amount of complexity goes into the implementation of the description of this loss function (which depends on the particular problem being solved by the Optimizer). The main concern we have right now is to separate this complexity (of defining the loss function) from the Implementation of the Optimizer, while maintaining reasonable ease of use. The iterative procedure of the algorithm (which is shared across the processors) uses parts of this Hypergraph structure. Do we need to introduce another function interface (SparseFunctionType), which allows us to maintain and ask for this structure in the Optimizer? The author's implementation is quite difficult to understand. Any insights would be helpful.
Thanks -- Shikhar Bhardwaj
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
