Re: [mlpack] Regarding profiling for parallelization

Yannis Mentekidis Thu, 30 Mar 2017 13:04:08 -0700

My mistake Shkhar, I didn't get that far down the documentation.

That sorts that worry out then! Cool :)


On Thu, Mar 30, 2017 at 8:13 PM, Shikhar Bhardwaj <
[email protected]> wrote:

> Hi Yannis,
>
> Thanks a lot for the detailed reply.
>
> I had given a thought to the problem of over subscription of hardware
> threads during testing if we were to test already parallelized methods in
> parallel. In the model that I am proposing, each test suite would have a
> different executable, which can be run in parallel with other test suites,
> using ctest. ctest has the option to declare a test to not be run in
> parallel with any other test, called RUN_SERIAL
> <https://cmake.org/cmake/help/v2.8.12/cmake.html#prop_test:RUN_SERIAL>.
> In this way, we could declare already parallelized methods to not be run in
> parallel with other tests.
>
>
> On Thu, Mar 30, 2017 at 11:32 PM, Yannis Mentekidis <[email protected]>
> wrote:
>
>> Hello Shikhar,
>>
>> Apologies for missing your email. My inbox is exploding these days.
>>
>> On Tue, Mar 28, 2017 at 6:00 PM, Shikhar Bhardwaj <
>> [email protected]> wrote:
>>
>>> Hello everyone!
>>>
>>> I have been looking through the mlpack codebase to find sections which
>>> could benefit from a muti threaded implementation. A couple of papers also
>>> shed some light on the implementation of machine learning algorithms in a
>>> multi threaded setting :
>>>
>>> 1. Map-Reduce for Machine Learning on Multicore
>>> <https://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf>
>>> 2. Parallelizing Machine Learning Algorithms
>>> <http://cs229.stanford.edu/proj2010/BatizBenetSlackSparksYahya-ParallelizingMachineLearningAlgorithms.pdf>
>>>
>>
>> That was an interesting read, thank you!
>>
>> I believe it would be interesting to see how we can mimic the Map-Reduce
>> model with SPMD structures... I'll read the paper again with a clearer head
>> in the weekend.
>>
>>
>>
>>>
>>> I have listed out some algorithms which can be implemented in this
>>> manner.
>>>
>>> One idea that I had was to parallelize testing. Currently, mlpack builds
>>> a single mlpack_test executable, which runs the tests on a single thread.
>>> Instead, we can build multiple test executables, and use CMake's ctest tool
>>> to run those tests, with as many jobs as the number of extra threads we
>>> have to spare. More on this here
>>> <https://baptiste-wicht.com/posts/2012/10/run-boost-test-parallel-cmake.html>.
>>> This can significantly reduce testing time, and help in reducing the time
>>> for the complete matrix builds planned in the future.
>>>
>>
>> I like the idea of parallelizing our tests, with the caveats Ryan and
>> Marcus have already mentioned. I would like to also mention that running
>> parallel algorithms in parallel (with "ctest -j 4" for example) should be
>> avoided if we don't want to end up being slower instead of faster.
>>
>> There is a way to disable nested parallelism within OpenMP, but if we
>> introduce a different level of parallelism we need to make sure we disable
>> OpenMP entirely, or run it with 1 computational thread - in which case,
>> running the parallelization tests themselves might now become tricky.
>> That is mainly because (out of laziness and for simplicity) my method of
>> testing parallelization in the past was setting computational threads to 1,
>> running a test, then setting it to 4, and running it again to compare
>> results :) You can take a look at lsh_test/ParallelBichromatic for an
>> example of this.
>>
>>
>>
>>>
>>> This wouldn't interfere with the aim of having a single, verifiable
>>> command for users to test the library, as mentioned in issue #137
>>> <https://github.com/mlpack/mlpack/issues/137>,.
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>> Shikhar
>>>
>>> _______________________________________________
>>> mlpack mailing list
>>> [email protected]
>>> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
>>>
>>
>>
>

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Regarding profiling for parallelization

Reply via email to