Hello Shikhar, Apologies for missing your email. My inbox is exploding these days.
On Tue, Mar 28, 2017 at 6:00 PM, Shikhar Bhardwaj < [email protected]> wrote: > Hello everyone! > > I have been looking through the mlpack codebase to find sections which > could benefit from a muti threaded implementation. A couple of papers also > shed some light on the implementation of machine learning algorithms in a > multi threaded setting : > > 1. Map-Reduce for Machine Learning on Multicore > <https://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf> > 2. Parallelizing Machine Learning Algorithms > <http://cs229.stanford.edu/proj2010/BatizBenetSlackSparksYahya-ParallelizingMachineLearningAlgorithms.pdf> > That was an interesting read, thank you! I believe it would be interesting to see how we can mimic the Map-Reduce model with SPMD structures... I'll read the paper again with a clearer head in the weekend. > > I have listed out some algorithms which can be implemented in this manner. > > One idea that I had was to parallelize testing. Currently, mlpack builds a > single mlpack_test executable, which runs the tests on a single thread. > Instead, we can build multiple test executables, and use CMake's ctest tool > to run those tests, with as many jobs as the number of extra threads we > have to spare. More on this here > <https://baptiste-wicht.com/posts/2012/10/run-boost-test-parallel-cmake.html>. > This can significantly reduce testing time, and help in reducing the time > for the complete matrix builds planned in the future. > I like the idea of parallelizing our tests, with the caveats Ryan and Marcus have already mentioned. I would like to also mention that running parallel algorithms in parallel (with "ctest -j 4" for example) should be avoided if we don't want to end up being slower instead of faster. There is a way to disable nested parallelism within OpenMP, but if we introduce a different level of parallelism we need to make sure we disable OpenMP entirely, or run it with 1 computational thread - in which case, running the parallelization tests themselves might now become tricky. That is mainly because (out of laziness and for simplicity) my method of testing parallelization in the past was setting computational threads to 1, running a test, then setting it to 4, and running it again to compare results :) You can take a look at lsh_test/ParallelBichromatic for an example of this. > > This wouldn't interfere with the aim of having a single, verifiable > command for users to test the library, as mentioned in issue #137 > <https://github.com/mlpack/mlpack/issues/137>,. > > Any thoughts? > > Thanks, > Shikhar > > _______________________________________________ > mlpack mailing list > [email protected] > http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack >
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
