On Mon, May 22, 2017 at 08:24:29PM +0530, Shikhar Bhardwaj wrote: > Thanks for the replies everyone. > > The primary goal of implementing something like ExecutionPolicy was to make > writing code more consistent with the rest of the methods in the library > and if possible introduce an abstraction over the parallelism offered by > OpenMP. > > For example, instead of introducing "ParallelSGD" (a separate optimizer > with the same DecomposableFunction and UpdatePolicy policies), we could add > a template parameter on the existing optimizer SGD, which would select the > appropriate implementation (parallel or sequential) depending on the > template parameter passed. This would speed up benchmarking and prototyping > and keep the number of different methods(in their core logic) minimum in > number. > > From the above discussion, I could understand that controlling the number > of threads from ExecutionPolicy may not be a good idea, as OpenMP already > gives overrides of that decision to the user in the form of the environment > variables.
Hi Shikhar, I would really advise avoiding introducing an abstraction over OpenMP. The reason I say this is that if someone doesn't know mlpack and wants to contribute, there are some technologies they will have to learn: - Armadillo - some template metaprogramming - parts of Boost - the STL - maybe OpenMP - the various bits of mlpack core functionality that are used all over Note that all of those, with the exception of the last, are something that a contributor might know from elsewhere. When we start to introduce abstractions over libraries that we are using, then people who know those libraries now need to also learn our abstractions too, instead of just using the knowledge they already had (which is transferable to other situations). Even if the abstraction turns out to be easy to learn, there is a mental hurdle to overcome, and also at first glance the abstraction may not appear to be easy to learn. In 2010 when we refactored mlpack in full, one idea being floated around was to wrap Armadillo functionality entirely, so that we could replace it with another matrix library if, e.g., Armadillo ever died or a better competitor came along. (In my view neither has happened, but my perspective is admittedly biased.) We ended up deciding against this approach for a couple of reasons: - the maintenance overhead of the abstraction itself; for Armadillo that would be a huge amount of code - the code we ended up writing would not look like Armadillo or any other matrix library, we'd essentially have code that looked like our own abstraction only, and this could cause people to avoid contributing to or using mlpack because of the unfamiliarity So I would really suggest that we consider the exact benefits of the ExecutionPolicy idea as compared to the existing functionality that OpenMP already gives us through environment variables. Otherwise we introduce complexity and maintenance with little gain (other than some abstraction of OpenMP). Nothing that's currently in mlpack is parallelized in any other way than OpenMP so I'm not sure that an abstraction would get us any more consistency than we already have. So, if we are going to say 'every mlpack class has to have an ExecutionPolicy template parameter', then there must be a very good reason for it---otherwise, we're making contributing and maintenance significantly harder. The amount of overhead and learning necessary to contribute to mlpack is already pretty high, and I want to avoid making that overhead more. Let me know what you think. Thanks, Ryan -- Ryan Curtin | "Maybe the next time." [email protected] | - J.G. Ballard _______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
