On Sat, Mar 25, 2017 at 09:49:04PM +0800, Thyrix Yang wrote:
> Hi,
> 
> I've looked through the methods in benchmark system and mlpack,
> here is a list of some not benchmarked methods:
> 
> 1. adaBoost
> 2. ann
> 3. dbscan
> 4. decision tree(some pr have been made)
> 5. gmm
> 6. hoeffding tree
> 7. mean shift clustering
> 8. svd
> 9. softmax regression
> 
> As mentioned in GSoC idea list, one choice is to benchmark some of
> these methods against other implementation. I think it's not hard for
> me, the main work is reading API of mlpack and other libraries.
> 
> Another idea is to speed up some method in mlpack, this one is much more
> difficult, and time consuming. Even though this idea is appealing to me,
> I don't have much confidence on the target of "the fastest of all the
> implementations". I think how much can we improve the speed and how to do it
> can only reveal after I have done enough research and experiment on one
> method.
> 
> I plan to take the benchmarking script as the base of my proposal, and
> if some method is slower, try to do some analysis. If I find
> something, I may start to do the speed up task on one method.
> 
> There is no executable of ann now, so I should write a executable for
> this task and do benchmarking on it? Or is it the time to provide a
> executable for ann?  (seems ann is in developing now, if it's needed
> to write a wrapper of whole ann or a specific type of ann, I'm glad to
> do this work)

Hi Thyrix,

Thanks for taking a look into this.  I agree that speeding up a method
in mlpack can be time-consuming and risky---even after some weeks of
looking into it, the conclusion may be that there is no reasonable way
to speed it up.

I think that it would be reasonable, just like you wrote, to put
together a proposal that added benchmarking for some (or all?) of 8 of
the 9 methods you pointed out that aren't currently being benchmarked
(not ann, see below).  But we would also need to add benchmarking
scripts for other libraries, otherwise we would not have anything to
compare against.

You could also add an algorithm you'd like to speed up, or leave time in
your timeline for accelerating some algorithms if you find a slow one,
just like you wrote.  Definitely there is the possibility that no gain
can be made, but that's ok---the project would not be a failure because
of that or anything.

For the ANN code, I don't think it's easy to write a single command-line
program for it---users may have different network structures that they
want to implement.  In that case it may be better to have a 'model zoo',
which there is an issue open to discuss:

https://github.com/mlpack/mlpack/issues/870

Let me know if I can clarify anything.

Thanks,

Ryan

-- 
Ryan Curtin    | "Where we're going, we won't need eyes to see."
[email protected] |   - Dr. Weir
_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Reply via email to