Re: [mlpack] Regarding profiling for parallelization

Shikhar Bhardwaj Thu, 30 Mar 2017 06:39:02 -0700

Thanks for the reply, Ryan.

The paper uses the map-reduce model of computation but does not require it.
The algorithmic requirements of the parallelized algorithms are quite
easily satisfied by the implementations in mlpack. So, the parallel
implementations in mlpack can be based on OpenMP's SPMD model. To quote the
paper,

"This paper’s contributions are:
(i) i) We show that any algorithm fitting the Statistical Query Model may
be written in a certain “summation form.” This form does not change the
underlying algorithm and so is not an approximation, but is instead an
exact implementation
(ii) The summation form does not depend on, but can be easily expressed in
a map-reduce [7] framework which is easy to program in.
..."

Regarding the extraction of test suites and test cases from the tests in
the directory, I think we can adopt the same technique of having to update
the CMakeLists.txt in the test directory with the test names when a new
test is added or an old one is modified, as is done with new executable
targets.

I agree this is more cumbersome and difficult to maintain. I'll look into
automated tools which can help us here, or some kind of CMake script which
can enumerate the test executables for us.

Thanks,
Shikhar

On Thu, Mar 30, 2017 at 6:32 PM, Ryan Curtin <[email protected]> wrote:

> On Tue, Mar 28, 2017 at 10:30:49PM +0530, Shikhar Bhardwaj wrote:
> > Hello everyone!
> >
> > I have been looking through the mlpack codebase to find sections which
> > could benefit from a muti threaded implementation. A couple of papers
> also
> > shed some light on the implementation of machine learning algorithms in a
> > multi threaded setting :
> >
> > 1. Map-Reduce for Machine Learning on Multicore
> > <https://papers.nips.cc/paper/3150-map-reduce-for-machine-
> learning-on-multicore.pdf>
> > 2. Parallelizing Machine Learning Algorithms
> > <http://cs229.stanford.edu/proj2010/BatizBenetSlackSparksYahya-
> ParallelizingMachineLearningAlgorithms.pdf>
> >
> > I have listed out some algorithms which can be implemented in this
> manner.
> >
> > One idea that I had was to parallelize testing. Currently, mlpack builds
> a
> > single mlpack_test executable, which runs the tests on a single thread.
> > Instead, we can build multiple test executables, and use CMake's ctest
> tool
> > to run those tests, with as many jobs as the number of extra threads we
> > have to spare. More on this here
> > <https://baptiste-wicht.com/posts/2012/10/run-boost-test-
> parallel-cmake.html>.
> > This can significantly reduce testing time, and help in reducing the time
> > for the complete matrix builds planned in the future.
> >
> > This wouldn't interfere with the aim of having a single, verifiable
> command
> > for users to test the library, as mentioned in issue #137
> > <https://github.com/mlpack/mlpack/issues/137>,.
> >
> > Any thoughts?
>
> Hi Shikhar,
>
> I'm not sure if a map-reduce approach would be the best way to go.  This
> would probably involve some significant restructuring, but ideally we
> should try to keep the code as easy to read as possible (I know this can
> be hard with C++ but we can at least try).
>
> I like the idea of parallel testing like you suggested but we would.need
> a way to extract all of the test cases or test suites from the code---we
> should avoid having a file that contains the names of all test files,
> because it will be very easy for that to go out of date.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "Rock-and-rollers don't bathe."
> [email protected] |   - Sneed
>

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Regarding profiling for parallelization

Reply via email to