Hi guys,

Shikhar is working on his project to profile different mlpack algorithms
and identify potential bottlenecks he could then parallelize. He's found a
paper (
https://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf)
which
adapts the MapReduce paradigm for certain algorithms, including Naive
Bayes, so he started with profiling that algorithm.

However, he and I have been struggling to actually find a dataset that
makes the algorithm take a significant amount of time. The time it takes
for the mlpack::data::Load() functions is 2-3 orders of magnitude larger
than the Train() and Classify() functions.

We were wondering:

   - Has anybody come across any usecases where NBC is slow enough to be
   worth parallelizing?
   - Does anyone have any tips on profiling the algorithm so that data
   loading is ignored, so we can focus on the things we can actually improve?

Thanks a lot in advance :)
_______________________________________________
mlpack mailing list
[email protected]
http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack

Reply via email to