Re: [mlpack] Regarding profiling for parallelization

Hrishikesh Menon Thu, 30 Mar 2017 06:33:52 -0700

Hi,

Sorry to jut in to the conversation, but when you are using Linux (or any
Unix based system), at least, extracting all test cases should be easy, as
long as they are in the same folder (like they are now). I believe calling
an ls from the system* , or maybe


FILE* file = popen("ls", "r");

using this code snippet, one can create a new file with the names of
the tests, every time the parallelization program is run.

Thanks,

Hrishikesh


On Thu, Mar 30, 2017 at 6:32 PM, Ryan Curtin <[email protected]> wrote:

> On Tue, Mar 28, 2017 at 10:30:49PM +0530, Shikhar Bhardwaj wrote:
> > Hello everyone!
> >
> > I have been looking through the mlpack codebase to find sections which
> > could benefit from a muti threaded implementation. A couple of papers
> also
> > shed some light on the implementation of machine learning algorithms in a
> > multi threaded setting :
> >
> > 1. Map-Reduce for Machine Learning on Multicore
> > <https://papers.nips.cc/paper/3150-map-reduce-for-machine-
> learning-on-multicore.pdf>
> > 2. Parallelizing Machine Learning Algorithms
> > <http://cs229.stanford.edu/proj2010/BatizBenetSlackSparksYahya-
> ParallelizingMachineLearningAlgorithms.pdf>
> >
> > I have listed out some algorithms which can be implemented in this
> manner.
> >
> > One idea that I had was to parallelize testing. Currently, mlpack builds
> a
> > single mlpack_test executable, which runs the tests on a single thread.
> > Instead, we can build multiple test executables, and use CMake's ctest
> tool
> > to run those tests, with as many jobs as the number of extra threads we
> > have to spare. More on this here
> > <https://baptiste-wicht.com/posts/2012/10/run-boost-test-
> parallel-cmake.html>.
> > This can significantly reduce testing time, and help in reducing the time
> > for the complete matrix builds planned in the future.
> >
> > This wouldn't interfere with the aim of having a single, verifiable
> command
> > for users to test the library, as mentioned in issue #137
> > <https://github.com/mlpack/mlpack/issues/137>,.
> >
> > Any thoughts?
>
> Hi Shikhar,
>
> I'm not sure if a map-reduce approach would be the best way to go.  This
> would probably involve some significant restructuring, but ideally we
> should try to keep the code as easy to read as possible (I know this can
> be hard with C++ but we can at least try).
>
> I like the idea of parallel testing like you suggested but we would.need
> a way to extract all of the test cases or test suites from the code---we
> should avoid having a file that contains the names of all test files,
> because it will be very easy for that to go out of date.
>
> Thanks,
>
> Ryan
>
> --
> Ryan Curtin    | "Rock-and-rollers don't bathe."
> [email protected] |   - Sneed
> _______________________________________________
> mlpack mailing list
> [email protected]
> http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Re: [mlpack] Regarding profiling for parallelization

Reply via email to