On Mon, Mar 21, 2016 at 08:21:27AM -0300, Marcos Pividori wrote: > Dear all, > > I am Marcos Pividori, a Computer Science student from the National > University of Rosario, Argentina ([1]). I am currently finishing my degree. > (5 year long degree) > I am writing because I am really interested in working this GSoC for mlpack.
Hi Marcos, Thanks for getting in touch. > I have been reading the papers recommended for this topic. I think I have a > general idea of the main methods for the Approximate Nearest Neighbor > problem. > Also, I have installed the mlpack library and looked at the main code > implementation. > So, I should split the project with these main goals: > + Implementation of approximate nearest neighbor search. > + Implementation of a command line tool (simple task). > + Comparisons between mlpack implementation and similar libraries, and > writing of proper documentation. > > About the implemenation of ann: do you have a preference on which method to > use? Should we implement many of these different methods and compare their > results? > I have found many different options in the literature: > + "An optimal algorithm for ANN Searching in Fixed dimensions" proposes > to use BBD-Trees for the computation, and it is implemented in the ANN > library. But I couldn't find more information on BBD trees. ANN library > also uses KD-Trees. > + "An Investigation of Practical Approximate Nearest Neighbor Algorithms" > proposes to use Spill-Trees (maybe Hybrid) with overlapping in points > separation. > + "Fast approximate nearest neighbors with automatic configuration" > reviews and compares the state of the art in ann. It mentioned the > randomized kd-trees and k-means clustering trees as the most promising > methods according to their results. FLANN library implements many of these > methods. > + scikit-learn library implements Locality-Sensitive Hashing for ann. mlpack has an LSH implementation already (see src/mlpack/methods/lsh/). The easiest (and best, in my opinion) route to a good approximate nearest neighbor search algorithm is to extend the existing dual-tree algorithm to be an approximate algorithm instead of an exact algorithm. If you read the paper "Tree-independent dual-tree algorithms" you will see that the pruning rule for dual-tree nearest neighbor search is of the form "prune if d_min(N_q, N_r) > B(N_q)" but this can easily be made approximate by a change to something like "if d_min(N_q, N_r) > B(N_q) + epsilon". This would yield a pretty simple change (or generalization) of the algorithm that's implemented. But I don't think that's enough work for the whole summer, so one of the papers you referenced might be worth implementing. I might consider implementing spill trees and then the defeatist search used by that algorithm. Then you could spend the rest of the time comparing and benchmarking implementations. I hope this is helpful. Thanks, Ryan -- Ryan Curtin | "He takes such wonderful pictures with [email protected] | his paws." - Radio man _______________________________________________ mlpack mailing list [email protected] https://mailman.cc.gatech.edu/mailman/listinfo/mlpack
