Daniel, This is a bit confusing.
How do you do Knn with O(1) memory? Sampling? Or does each mapper find n nearest neighbors in a slice of the training data and then pass that on to the reducer which keeps the k best from all the mappers for a particular input? Also, what is the standard input to the mapper? The training data with the instances to classify on the side to be read by all mappers? On Sat, Mar 26, 2011 at 1:40 PM, Daniel McEnnis <[email protected]> wrote: > Josh, > > The initial plan is to keep it quite simple. No ball trees, no > enhancements. Ball trees are likely to require each node to have in > memory the ground truth - too high a memory requirement. The goal is > a simple KNN that uses O(c) memory in a map stage that assigns the > class. Not very interesting. > > Daniel. > > On Sat, Mar 26, 2011 at 4:23 PM, Josh Patterson <[email protected]> wrote: > > What kind of approach would you use? I've done one of these before > > with a balltree which was effective. I'd be interested in working on > > spatial trees in mahout. > > > > Josh > > > > On Saturday, March 26, 2011, Daniel McEnnis <[email protected]> wrote: > >> Dear Mahout developers, > >> > >> While I'm learning the code, I thought I'd ask if there was any > >> objection to me working on a KNN classifier module as my learning > >> project. I should be able to make this at worst O(n) space over the > >> training set and O(c) space over the input set using Map Reduce. Its > >> something I'm quite familiar with and fills a gap in the classifier > >> portfolio. > >> > >> Sincerely, > >> > >> Daniel McEnnis. > >> > > > > -- > > Twitter: @jpatanooga > > Solution Architect @ Cloudera > > hadoop: http://www.cloudera.com > > blog: http://jpatterson.floe.tv > > >
