Re: [mlpack] Regarding profiling for parallelization

Yannis Mentekidis Thu, 30 Mar 2017 11:03:06 -0700

Hello Shikhar,

Apologies for missing your email. My inbox is exploding these days.

On Tue, Mar 28, 2017 at 6:00 PM, Shikhar Bhardwaj <
[email protected]> wrote:

> Hello everyone!
>
> I have been looking through the mlpack codebase to find sections which
> could benefit from a muti threaded implementation. A couple of papers also
> shed some light on the implementation of machine learning algorithms in a
> multi threaded setting :
>
> 1. Map-Reduce for Machine Learning on Multicore
> <https://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf>
> 2. Parallelizing Machine Learning Algorithms
> <http://cs229.stanford.edu/proj2010/BatizBenetSlackSparksYahya-ParallelizingMachineLearningAlgorithms.pdf>
>

That was an interesting read, thank you!

I believe it would be interesting to see how we can mimic the Map-Reduce
model with SPMD structures... I'll read the paper again with a clearer head
in the weekend.

>
> I have listed out some algorithms which can be implemented in this manner.
>
> One idea that I had was to parallelize testing. Currently, mlpack builds a
> single mlpack_test executable, which runs the tests on a single thread.
> Instead, we can build multiple test executables, and use CMake's ctest tool
> to run those tests, with as many jobs as the number of extra threads we
> have to spare. More on this here
> <https://baptiste-wicht.com/posts/2012/10/run-boost-test-parallel-cmake.html>.
> This can significantly reduce testing time, and help in reducing the time
> for the complete matrix builds planned in the future.
>

I like the idea of parallelizing our tests, with the caveats Ryan and
Marcus have already mentioned. I would like to also mention that running
parallel algorithms in parallel (with "ctest -j 4" for example) should be
avoided if we don't want to end up being slower instead of faster.

There is a way to disable nested parallelism within OpenMP, but if we
introduce a different level of parallelism we need to make sure we disable
OpenMP entirely, or run it with 1 computational thread - in which case,
running the parallelization tests themselves might now become tricky.
That is mainly because (out of laziness and for simplicity) my method of
testing parallelization in the past was setting computational threads to 1,
running a test, then setting it to 4, and running it again to compare
results :) You can take a look at lsh_test/ParallelBichromatic for an
example of this.

>
> This wouldn't interfere with the aim of having a single, verifiable
> command for users to test the library, as mentioned in issue #137
> <https://github.com/mlpack/mlpack/issues/137>,.
>
> Any thoughts?
>
> Thanks,
> Shikhar
>
> _______________________________________________
> mlpack mailing list
> [email protected]
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Regarding profiling for parallelization

Reply via email to