Ted,

Please keep in mind, I just downloaded the Mahout code today.  My
knowledge is from a single presentation and Cloudera's Hadoop
tutorial.  My goal is to have two stages - training takes a sequence
of vectors and classifications and creates a large hdfs file of the
form vector-classification.  This file is streamed to each node
classifying an incoming set of vectors.  Each vector is compared
against the vector to be classified and the table of k best matches is
created from this.  Majority wins, resulting in key-classification or
classification-key output.  With streaming of the training file, only
k+2 vectors are needed in memory, achieving O(1) memory use and
embarrassingly parallel execution.

Daniel

On Sat, Mar 26, 2011 at 5:25 PM, Ted Dunning <[email protected]> wrote:
> Daniel,
> This is a bit confusing.
> How do you do Knn with O(1) memory?  Sampling?
> Or does each mapper find n nearest neighbors in a slice of the training data
> and then pass that on to the reducer which keeps the k best from all the
> mappers for a particular input?
> Also, what is the standard input to the mapper?  The training data with the
> instances to classify on the side to be read by all mappers?
>
> On Sat, Mar 26, 2011 at 1:40 PM, Daniel McEnnis <[email protected]> wrote:
>>
>> Josh,
>>
>> The initial plan is to keep it quite simple.  No ball trees, no
>> enhancements.  Ball trees are likely to require each node to have in
>> memory the ground truth - too high a memory requirement.  The goal is
>> a simple KNN that uses O(c) memory in a map stage that assigns the
>> class. Not very interesting.
>>
>> Daniel.
>>
>> On Sat, Mar 26, 2011 at 4:23 PM, Josh Patterson <[email protected]> wrote:
>> > What kind of approach would you use? I've done one of these before
>> > with a balltree which was effective. I'd be interested in working on
>> > spatial trees in mahout.
>> >
>> > Josh
>> >
>> > On Saturday, March 26, 2011, Daniel McEnnis <[email protected]> wrote:
>> >> Dear Mahout developers,
>> >>
>> >> While I'm learning the code, I thought I'd ask if there was any
>> >> objection to me working on a KNN classifier module as my learning
>> >> project.  I should be able to make this at worst O(n) space over the
>> >> training set and O(c) space over the input set using Map Reduce.  Its
>> >> something I'm quite familiar with and fills a gap in the classifier
>> >> portfolio.
>> >>
>> >> Sincerely,
>> >>
>> >> Daniel McEnnis.
>> >>
>> >
>> > --
>> > Twitter: @jpatanooga
>> > Solution Architect @ Cloudera
>> > hadoop: http://www.cloudera.com
>> > blog: http://jpatterson.floe.tv
>> >
>
>

Reply via email to