Dear all, I am Marcos Pividori, a Computer Science student from the National University of Rosario, Argentina ([1]). I am currently finishing my degree. (5 year long degree) I am writing because I am really interested in working this GSoC for mlpack.
After reading the different ideas, I got really interested in the project "Approximate Nearest Neighbor Search". I have read the discussions about this topic in the mlpack list, and I analysed the papers recommended for this project. I know there is not much time left to the application deadline, I apologize for this. I thought it would be appropriate to read the main documentation, and get a general understanding of the project, before writing. I have participated on some GSoC projects in the past: GSoC 2013 ([2][3]) for Haskell.org and GSoC 2015 ([4][5]) for OpenCog. During 2014, I applied and was accepted to do an internship for iRobot, working during 4 months at the offices in Pasadena, US, on improving the software for the Roomba product. As final project of the AI subject, I developed a computer Go player based on Monte Carlo Tree Search algorithm, in C++. In August 2015, I presented it to the EST 2015, Student Contest - 44 Argentine Conference of Informatics, where I got the first place. ([6] [7]) I would like to let you know that I am definitely really interested in this project. I am at the final steps of my degree, and I have to start working on my graduation thesis, but I haven't found a thesis topic yet. I see algorithms and data structures applied to ML as a potential topic. So, this GSoC project would give me the opportunity to go in depth in this area, working for a popular library, and gaining valuable experience from the community. I have been reading the papers recommended for this topic. I think I have a general idea of the main methods for the Approximate Nearest Neighbor problem. Also, I have installed the mlpack library and looked at the main code implementation. So, I should split the project with these main goals: + Implementation of approximate nearest neighbor search. + Implementation of a command line tool (simple task). + Comparisons between mlpack implementation and similar libraries, and writing of proper documentation. About the implemenation of ann: do you have a preference on which method to use? Should we implement many of these different methods and compare their results? I have found many different options in the literature: + "An optimal algorithm for ANN Searching in Fixed dimensions" proposes to use BBD-Trees for the computation, and it is implemented in the ANN library. But I couldn't find more information on BBD trees. ANN library also uses KD-Trees. + "An Investigation of Practical Approximate Nearest Neighbor Algorithms" proposes to use Spill-Trees (maybe Hybrid) with overlapping in points separation. + "Fast approximate nearest neighbors with automatic configuration" reviews and compares the state of the art in ann. It mentioned the randomized kd-trees and k-means clustering trees as the most promising methods according to their results. FLANN library implements many of these methods. + scikit-learn library implements Locality-Sensitive Hashing for ann. I would be grateful if you could give me your opinion on this. Do you think I should focus on something in special? I continue reading about this topic and preparing a proposal. Thanks, Marcos [1] http://www.unr.edu.ar [2] http://gsoc2013cwithmobiledevices.blogspot.com.ar [3] http://github.com/MarcosPividori/GSoC-Communicating-with-mobile-devices [4] http://wiki.opencog.org/wikihome/index.php/Haskell_Bindings_-_GSoC_2015 [5] http://github.com/MarcosPividori/atomspace/tree/master/opencog/haskell [6] http://44jaiio.sadio.org.ar/sites/default/files/est215-224.pdf [7] http://github.com/MarcosPividori/Go-player
_______________________________________________ mlpack mailing list [email protected] https://mailman.cc.gatech.edu/mailman/listinfo/mlpack
