Hi guys, Shikhar is working on his project to profile different mlpack algorithms and identify potential bottlenecks he could then parallelize. He's found a paper ( https://papers.nips.cc/paper/3150-map-reduce-for-machine-learning-on-multicore.pdf) which adapts the MapReduce paradigm for certain algorithms, including Naive Bayes, so he started with profiling that algorithm.
However, he and I have been struggling to actually find a dataset that makes the algorithm take a significant amount of time. The time it takes for the mlpack::data::Load() functions is 2-3 orders of magnitude larger than the Train() and Classify() functions. We were wondering: - Has anybody come across any usecases where NBC is slow enough to be worth parallelizing? - Does anyone have any tips on profiling the algorithm so that data loading is ignored, so we can focus on the things we can actually improve? Thanks a lot in advance :)
_______________________________________________ mlpack mailing list [email protected] http://knife.lugatgt.org/cgi-bin/mailman/listinfo/mlpack
