Sure NJ.

Thanks!




Auon

________________________________
From: Nandish Jayaram <[email protected]>
Sent: Tuesday, December 13, 2016 12:22:50 PM
To: [email protected]
Subject: Re: Adding KNN to madlib

Hi Auon,

I do see the pull request, thank you! Folks in the community should also be
able to comment on it! :)
I too will have a look at it sometime soon and comment on the PR if need be.

NJ

On Mon, Dec 12, 2016 at 6:30 PM, Kazmi,Auon H <[email protected]> wrote:

> Hi NJ,
>
> I have done that. Please check if it is rightly done.
>
>
>
>
> Thanks,
>
> Auon
>
> ________________________________
> From: Nandish Jayaram <[email protected]>
> Sent: Monday, December 12, 2016 6:28:38 PM
> To: [email protected]
> Subject: Re: Adding KNN to madlib
>
> Hi Auon,
>
> Please push all the changes you have made in your branch for KNN to your
> incubator-madlib repo, and open a PR on that push.
>
> NJ
>
> On Mon, Dec 12, 2016 at 1:58 PM, Kazmi,Auon H <[email protected]> wrote:
>
> > Hi NJ,
> >
> > Where should I git push my code? I am doing that in my github id. Also,
> > should I push just KNN folder or the whole src/ folder of madlib?
> >
> >
> >
> > Regards,
> >
> > Auon
> >
> > ________________________________
> > From: Kazmi,Auon H <[email protected]>
> > Sent: Monday, December 5, 2016 8:32:38 PM
> > To: [email protected]
> > Subject: Re: Adding KNN to madlib
> >
> > Hi NJ,
> >
> > Thanks!
> >
> > I will do that.
> >
> >
> >
> >
> > Regards,
> >
> > Auon
> >
> > ________________________________
> > From: Nandish Jayaram <[email protected]>
> > Sent: Sunday, December 4, 2016 1:39:53 PM
> > To: [email protected]
> > Subject: Re: Adding KNN to madlib
> >
> > Hi Auon,
> >
> > That's great!
> > I think the best way to share your code with the community is by opening
> a
> > pull request on github. Please do that and a lot of folks will be able to
> > comment and give suggestions to you.
> >
> > NJ
> >
> > On Sat, Dec 3, 2016 at 2:13 PM, Kazmi,Auon H <[email protected]> wrote:
> >
> > > Hi NJ,
> > >
> > > I got the solution to my problem.
> > >
> > > So, I might be done with my first version of interface of KNN for
> > > classification as suggested by you, by Monday or so. I will generalise
> it
> > > for regression and then please let me know how to share it with you
> guys.
> > > After that, I can start making required changes as and when needed.
> > >
> > >
> > >
> > > regards,
> > >
> > > Auon Haidar
> > >
> > > ________________________________
> > > From: Kazmi,Auon H <[email protected]>
> > > Sent: Thursday, December 1, 2016 2:59:21 PM
> > > To: [email protected]
> > > Subject: Re: Adding KNN to madlib
> > >
> > > Hi NJ,
> > >
> > > No, this is just an example I gave. So, I want in a postgres function
> to
> > > iterate over the rows of a table given as a VARCHAR argument.
> > >
> > > FOR r IN EXECUTE format('SELECT * FROM %I', point_source)
> > >
> > > will do that. Now, r is a record, i.e. a row of table 'point_source'. I
> > > want to store a particular column of that row r in a variable. Now,
> this
> > > column name is also passed as VARCHAR argument to function. I am not
> able
> > > to figure out the way to access this particular column from the current
> > row
> > > 'r'.
> > >
> > >
> > > Basically, I am trying to iterate over my testing data one by one and
> > pass
> > > its vector column to a function that finds its label.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Auon
> > >
> > >
> > > ________________________________
> > > From: Nandish Jayaram <[email protected]>
> > > Sent: Thursday, December 1, 2016 2:51:47 PM
> > > To: [email protected]
> > > Subject: Re: Adding KNN to madlib
> > >
> > > Hi Auon,
> > >
> > > My apologies for the late reply.
> > > Can you please give me more information regarding the design approach
> you
> > > have taken. Information like
> > > what files you have created so far would be helpful. I am not sure I
> > > understand your approach correctly
> > > yet. Is the above snippet of code the only code you have, or do you
> have
> > > some other files too?
> > >
> > > NJ
> > >
> > > On Tue, Nov 29, 2016 at 10:06 PM, Kazmi,Auon H <[email protected]> wrote:
> > >
> > > > Hi NJ,
> > > >
> > > > I got stuck at a place. Need a little help.
> > > >
> > > > Suppose I have a function that receives table_name and column_name as
> > > > varchar.
> > > >
> > > > Now I would like to iterate through each rows of this table, while
> > > > accessing the value of this column. I am doing something like this:
> > > >
> > > >
> > > > CREATE OR REPLACE FUNCTION Foo(
> > > > table_name VARCHAR,
> > > > column_name VARCHAR
> > > > ) RETURNS VOID AS
> > > > $BODY$
> > > > DECLARE
> > > >     r record;
> > > >     b integer;
> > > > BEGIN
> > > >
> > > >     FOR r IN EXECUTE format('SELECT * FROM %I', point_source)
> > > >     LOOP
> > > >
> > > >         b := r.column_name;
> > > >
> > > >    END LOOP
> > > > END
> > > >
> > > > So, everything works except column_name is a varchar. So,
> r.column_name
> > > > won't give me the correponding column's value in extracted row r. So,
> > > > suppose it is 'pid' in the given table, then b:= r.pid will give the
> > > right
> > > > result, but I want to get this effective statement from
> > > > b := r.column_name;
> > > >
> > > >
> > > > Could you please help.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Auon
> > > >
> > > > ________________________________
> > > > From: Kazmi,Auon H <[email protected]>
> > > > Sent: Friday, November 25, 2016 3:23:46 PM
> > > > To: [email protected]
> > > > Subject: Re: Adding KNN to madlib
> > > >
> > > > Thanks NJ,
> > > >
> > > > I will move forward in the suggested way.
> > > >
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Auon
> > > >
> > > > ________________________________
> > > > From: Nandish Jayaram <[email protected]>
> > > > Sent: Wednesday, November 23, 2016 12:20:35 PM
> > > > To: [email protected]
> > > > Subject: Re: Adding KNN to madlib
> > > >
> > > > Hey Auon,
> > > >
> > > > Starting with only classification for now sounds like a good idea!
> > > > Yes, the output should be just the predicted label for each row.
> > > > If the table you want to run the classification task on is like the
> > > > following:
> > > > *id |   x   |  y*
> > > > 1    10     10.5
> > > > 2    30     31.5
> > > > 3    20     22.5
> > > >
> > > > then the output table could be something like the following:
> > > > *id |   x   |    y     |  predicted_label*
> > > > 1    10     10.5          true
> > > > 2    30     31.5          false
> > > > 3    20     22.5          true
> > > >
> > > > You are basically adding a new column to the input table called
> > > > "predicted_label", and assign the label for each row based on the
> k-NN.
> > > >
> > > > We can certainly make it better, by modifying the kNN function
> > interface.
> > > > But let's just keep it simple for now and work on that later.
> > > >
> > > > NJ
> > > >
> > > > On Tue, Nov 22, 2016 at 2:52 PM, Kazmi,Auon H <[email protected]>
> wrote:
> > > >
> > > > >
> > > > > Hi NJ,
> > > > >
> > > > > I have implemented a first version of interface as suggested by
> you.
> > > > Right
> > > > > now, I am just looking at classification task. I will generalize it
> > to
> > > > work
> > > > > for regression task as well. I have a question regarding output of
> > the
> > > > > function. Should it just be the predicted label (or prediction
> value
> > in
> > > > > case of regression)? Can you give an example of output?
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Auon Haidar
> > > > >
> > > > > ________________________________
> > > > > From: Kazmi,Auon H <[email protected]>
> > > > > Sent: Friday, November 18, 2016 3:16:00 AM
> > > > > To: [email protected]
> > > > > Subject: Re: Adding KNN to madlib
> > > > >
> > > > > Hi NJ,
> > > > >
> > > > > Thanks for your inputs!
> > > > >
> > > > > I will go through everyone of them and try to incorporate them.
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Auon Haidar
> > > > >
> > > > > ________________________________
> > > > > From: Nandish Jayaram <[email protected]>
> > > > > Sent: Wednesday, November 16, 2016 2:29:05 PM
> > > > > To: [email protected]
> > > > > Subject: Re: Adding KNN to madlib
> > > > >
> > > > > Hi Auon,
> > > > >
> > > > > Defining the interface is a good start for k-NN. I have slightly
> > > modified
> > > > > your interface to help it conform with other MADlib algorithms'
> > > > interfaces.
> > > > > Note that the output for each new data point is not the 'k' nearest
> > > > > neighbors, but either a classification or regression task on the
> data
> > > > point
> > > > > based on its 'k' nearest neighbors. Every data point in the
> training
> > > data
> > > > > will have an associated class label (regression value) in a
> different
> > > > > column. Normally, the column containing the data point itself is
> > called
> > > > the
> > > > > independent variable, and the column containing the class label is
> > > called
> > > > > the dependent variable. If it is classification, you take a
> majority
> > > vote
> > > > > of the class labels of the 'k' nearest neighbors, and if it is
> > > > regression,
> > > > > you average the dependent variable values of the 'k' nearest
> > neighbors.
> > > > > Here is a preliminary interface we could start with:
> > > > >
> > > > > *knn*(
> > > > > source_table, -- *TEXT, name of table containing training data.*
> > > > > new_data_table, -- *TEXT, name of table containing new data on
> which
> > > > > classification or regression has to be performed. Classification or
> > > > > regression can be performed based on the type of
> > "dependent_varname".*
> > > > > output_table, -- *TEXT, name of the table where output predictors
> are
> > > > > written. If this table is already present, an error is returned.*
> > > > > dependent_varname, -- *TEXT, name of the independent variable
> column.
> > > If
> > > > > this column is of type boolean/integer, we could probably perform
> > k-NN
> > > > > classification, and perform k-NN regression if this is of type
> > double.*
> > > > > independent_varname, -- *TEXT, column defining data points. Data
> > points
> > > > can
> > > > > be of type SVEC or any type convertible to SVEC such as float[] or
> > > > > integer[].*
> > > > > k, --* INTEGER, (optional, default value could be some odd number,
> > say
> > > 5)
> > > > > number of neighbors to consider*
> > > > > metric, -- *TEXT, (optional, default value could be what you are
> > using
> > > > now
> > > > > for distance) the distance metric to use.*
> > > > > );
> > > > >
> > > > > For now you can just use the distance metric you had mentioned in
> an
> > > > > earlier email. Note that the source_table and new_data_table are
> > tables
> > > > in
> > > > > the database and not files.
> > > > >
> > > > > Some pointers to help you start off with the implementation:
> > > > > -
> > > > > https://cwiki.apache.org/confluence/display/MADLIB/
> > > > Quick+Start+Guide+for+
> > > > > Developers
> > > > > is a very useful resource with a great hello-world example. It
> gives
> > > you
> > > > > details about how to add a new module (k-NN would be a new module)
> to
> > > > > MADlib.
> > > > > - k-NN is a great candidate for parallelizing. Do try to use UDA
> > (User
> > > > > Defined Aggregates) in your implementation. This will require you
> to
> > > add
> > > > a
> > > > > C++ layer too, along with the SQL and python layers. Feel free to
> ask
> > > > > specific questions about this after you have tried out the hello
> > world
> > > > > example.
> > > > > - Chapter 1 in http://madlib.incubator.apache.org/design.pdf gives
> > you
> > > > > more
> > > > > Design Document - Apache MADlib<http://madlib.
> > > > incubator.apache.org/design.
> > > > > pdf>
> > > > > madlib.incubator.apache.org
> > > > > 1 AbstractionLayers Author FlorianSchoppmann Historyv0.6
> > > > > ReplacedUML?gure[RahulIyer] v0.5 Initialrevisionofdesigndocument
> > v0.4
> > > > > Supportforfunctionpointersandsparse ...
> > > > >
> > > > >
> > > > >
> > > > > information regarding the C++ abstraction layer in MADlib.
> > > > >
> > > > > Feel free to shout out for help if you are stuck! Cheers. :)
> > > > >
> > > > > NJ
> > > > >
> > > > > On Tue, Nov 15, 2016 at 2:56 PM, Kazmi,Auon H <[email protected]>
> > wrote:
> > > > >
> > > > > > Hi Frank and NJ,
> > > > > >
> > > > > > Thanks for your comments. I will go through the suggestions
> > provided
> > > by
> > > > > NJ.
> > > > > >
> > > > > > Current interface of KNN is as follows:
> > > > > >
> > > > > > 1) Input:
> > > > > >
> > > > > >        - Name of table having all the data points in
> n-dimensional
> > > > vector
> > > > > > form (Double                              Precision[ ])
> > > > > >
> > > > > >        - Column-name of these data points
> > > > > >
> > > > > >        - Name of file having that n-dim vector (v, say) whose
> > > k-nearest
> > > > > > neighbours need to be               found from first table
> (Double
> > > > > > Precision[ ])
> > > > > >
> > > > > >        - Column name having this vector
> > > > > >
> > > > > >        - value of 'k'
> > > > > >
> > > > > >
> > > > > > It returns 'k' nearest neighbours of vector v from first table
> > having
> > > > > data
> > > > > > points.
> > > > > >
> > > > > >
> > > > > >
> > > > > > For now, I am using madlib's squared norm function to calculate
> > > > distance
> > > > > > between any two vectors. I will try to generalise that.
> > > > > >
> > > > > >
> > > > > > Please suggest any other improvements.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Auon Haidar
> > > > > >
> > > > > > ________________________________
> > > > > > From: Frank McQuillan <[email protected]>
> > > > > > Sent: Tuesday, November 15, 2016 1:30:53 PM
> > > > > > To: [email protected]
> > > > > > Subject: Re: Adding KNN to madlib
> > > > > >
> > > > > > Auon,
> > > > > >
> > > > > > Thanks for working on kNN for MADlib.   Can you expand a little
> bit
> > > on
> > > > > your
> > > > > > note, and post the interface that you are thinking about and
> > > > description
> > > > > of
> > > > > > the arguments?  Then people can comment on that.
> > > > > >
> > > > > > Thanks,
> > > > > > Frank
> > > > > >
> > > > > > On Tue, Nov 15, 2016 at 9:30 AM, Nandish Jayaram <
> > > [email protected]>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Auon,
> > > > > > >
> > > > > > > Great going with your first version of k-NN implementation.
> > > > > > > Some useful links for coding guidelines are at (see Developer
> > > > > > > Documentation):
> > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.
> > > > > > action?pageId=61319606
> > > > > > > MADilb has something called as install-checks for basic
> testing.
> > > You
> > > > > can
> > > > > > > look at any existing module for an example of the same. For
> > > instance,
> > > > > > check
> > > > > > > out the install check code for k-means at:
> > > > > > > https://github.com/apache/incubator-madlib/tree/master/
> > > > > > > src/ports/postgres/modules/kmeans/test
> > > > > > >
> > > > > > > I am sure others will pitch in to help you more with your other
> > > > > > questions,
> > > > > > > but these are some starters you can consider! Good luck!
> > > > > > >
> > > > > > > NJ
> > > > > > >
> > > > > > > On Mon, Nov 14, 2016 at 10:41 PM, Kazmi,Auon H <[email protected]
> >
> > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am a first year Computer Science graduate student at
> > University
> > > > of
> > > > > > > > Florida working on implementing KNN in Madlib. I am ready
> with
> > a
> > > > > first
> > > > > > > > version of it but I don't know how to proceed with testing
> and
> > > > adding
> > > > > > it
> > > > > > > to
> > > > > > > > Madlib platform. Also, I am not clear on what standards do I
> > have
> > > > to
> > > > > > > choose
> > > > > > > > in the final implementation. My current version asks for the
> > > table
> > > > > name
> > > > > > > and
> > > > > > > > column name having vectors in which I have to find the
> > > neighbours.
> > > > > The
> > > > > > > > other table given as input holds the vector whose K-NN needs
> to
> > > be
> > > > > > found.
> > > > > > > > It is assuming euclidean distance metric for distance
> > > calculation.
> > > > It
> > > > > > > would
> > > > > > > > really help if somebody can share ideas on what can be added
> to
> > > > this
> > > > > > > > functionality.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Auon Haidar Kazmi
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to