Re: [mlpack] Regarding profiling for parallelization

Shikhar Bhardwaj Thu, 30 Mar 2017 12:13:32 -0700

Hi Yannis,

Thanks a lot for the detailed reply.


I had given a thought to the problem of over subscription of hardware
threads during testing if we were to test already parallelized methods in
parallel. In the model that I am proposing, each test suite would have a
different executable, which can be run in parallel with other test suites,
using ctest. ctest has the option to declare a test to not be run in
parallel with any other test, called RUN_SERIAL
<https://cmake.org/cmake/help/v2.8.12/cmake.html#prop_test:RUN_SERIAL>. In
this way, we could declare already parallelized methods to not be run in
parallel with other tests.


On Thu, Mar 30, 2017 at 11:32 PM, Yannis Mentekidis <[email protected]>
wrote:

> Hello Shikhar,
>
> Apologies for missing your email. My inbox is exploding these days.
>
> On Tue, Mar 28, 2017 at 6:00 PM, Shikhar Bhardwaj <
> [email protected]> wrote:
>
>> Hello everyone!
>>
>> I have been looking through the mlpack codebase to find sections which
>> could benefit from a muti threaded implementation. A couple of papers also
>> shed some light on the implementation of machine learning algorithms in a
>> multi threaded setting :
>>
>> 1. Map-Reduce for Machine Learning on Multicore
>> <https://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf>
>> 2. Parallelizing Machine Learning Algorithms
>> <http://cs229.stanford.edu/proj2010/BatizBenetSlackSparksYahya-ParallelizingMachineLearningAlgorithms.pdf>
>>
>
> That was an interesting read, thank you!
>
> I believe it would be interesting to see how we can mimic the Map-Reduce
> model with SPMD structures... I'll read the paper again with a clearer head
> in the weekend.
>
>
>
>>
>> I have listed out some algorithms which can be implemented in this manner.
>>
>> One idea that I had was to parallelize testing. Currently, mlpack builds
>> a single mlpack_test executable, which runs the tests on a single thread.
>> Instead, we can build multiple test executables, and use CMake's ctest tool
>> to run those tests, with as many jobs as the number of extra threads we
>> have to spare. More on this here
>> <https://baptiste-wicht.com/posts/2012/10/run-boost-test-parallel-cmake.html>.
>> This can significantly reduce testing time, and help in reducing the time
>> for the complete matrix builds planned in the future.
>>
>
> I like the idea of parallelizing our tests, with the caveats Ryan and
> Marcus have already mentioned. I would like to also mention that running
> parallel algorithms in parallel (with "ctest -j 4" for example) should be
> avoided if we don't want to end up being slower instead of faster.
>
> There is a way to disable nested parallelism within OpenMP, but if we
> introduce a different level of parallelism we need to make sure we disable
> OpenMP entirely, or run it with 1 computational thread - in which case,
> running the parallelization tests themselves might now become tricky.
> That is mainly because (out of laziness and for simplicity) my method of
> testing parallelization in the past was setting computational threads to 1,
> running a test, then setting it to 4, and running it again to compare
> results :) You can take a look at lsh_test/ParallelBichromatic for an
> example of this.
>
>
>
>>
>> This wouldn't interfere with the aim of having a single, verifiable
>> command for users to test the library, as mentioned in issue #137
>> <https://github.com/mlpack/mlpack/issues/137>,.
>>
>> Any thoughts?
>>
>> Thanks,
>> Shikhar
>>
>> _______________________________________________
>> mlpack mailing list
>> [email protected]
>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>>
>
>

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Regarding profiling for parallelization

Reply via email to