On Sat, Mar 25, 2017 at 09:49:04PM +0800, Thyrix Yang wrote: > Hi, > > I've looked through the methods in benchmark system and mlpack, > here is a list of some not benchmarked methods: > > 1. adaBoost > 2. ann > 3. dbscan > 4. decision tree(some pr have been made) > 5. gmm > 6. hoeffding tree > 7. mean shift clustering > 8. svd > 9. softmax regression > > As mentioned in GSoC idea list, one choice is to benchmark some of > these methods against other implementation. I think it's not hard for > me, the main work is reading API of mlpack and other libraries. > > Another idea is to speed up some method in mlpack, this one is much more > difficult, and time consuming. Even though this idea is appealing to me, > I don't have much confidence on the target of "the fastest of all the > implementations". I think how much can we improve the speed and how to do it > can only reveal after I have done enough research and experiment on one > method. > > I plan to take the benchmarking script as the base of my proposal, and > if some method is slower, try to do some analysis. If I find > something, I may start to do the speed up task on one method. > > There is no executable of ann now, so I should write a executable for > this task and do benchmarking on it? Or is it the time to provide a > executable for ann? (seems ann is in developing now, if it's needed > to write a wrapper of whole ann or a specific type of ann, I'm glad to > do this work)
Hi Thyrix, Thanks for taking a look into this. I agree that speeding up a method in mlpack can be time-consuming and risky---even after some weeks of looking into it, the conclusion may be that there is no reasonable way to speed it up. I think that it would be reasonable, just like you wrote, to put together a proposal that added benchmarking for some (or all?) of 8 of the 9 methods you pointed out that aren't currently being benchmarked (not ann, see below). But we would also need to add benchmarking scripts for other libraries, otherwise we would not have anything to compare against. You could also add an algorithm you'd like to speed up, or leave time in your timeline for accelerating some algorithms if you find a slow one, just like you wrote. Definitely there is the possibility that no gain can be made, but that's ok---the project would not be a failure because of that or anything. For the ANN code, I don't think it's easy to write a single command-line program for it---users may have different network structures that they want to implement. In that case it may be better to have a 'model zoo', which there is an issue open to discuss: https://github.com/mlpack/mlpack/issues/870 Let me know if I can clarify anything. Thanks, Ryan -- Ryan Curtin | "Where we're going, we won't need eyes to see." [email protected] | - Dr. Weir _______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
