Hi Frank and NJ,
Thanks for your comments. I will go through the suggestions provided by NJ.
Current interface of KNN is as follows:
1) Input:
- Name of table having all the data points in n-dimensional vector form
(Double Precision[ ])
- Column-name of these data points
- Name of file having that n-dim vector (v, say) whose k-nearest
neighbours need to be found from first table (Double Precision[ ])
- Column name having this vector
- value of 'k'
It returns 'k' nearest neighbours of vector v from first table having data
points.
For now, I am using madlib's squared norm function to calculate distance
between any two vectors. I will try to generalise that.
Please suggest any other improvements.
Thanks,
Auon Haidar
________________________________
From: Frank McQuillan <[email protected]>
Sent: Tuesday, November 15, 2016 1:30:53 PM
To: [email protected]
Subject: Re: Adding KNN to madlib
Auon,
Thanks for working on kNN for MADlib. Can you expand a little bit on your
note, and post the interface that you are thinking about and description of
the arguments? Then people can comment on that.
Thanks,
Frank
On Tue, Nov 15, 2016 at 9:30 AM, Nandish Jayaram <[email protected]>
wrote:
> Hi Auon,
>
> Great going with your first version of k-NN implementation.
> Some useful links for coding guidelines are at (see Developer
> Documentation):
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61319606
> MADilb has something called as install-checks for basic testing. You can
> look at any existing module for an example of the same. For instance, check
> out the install check code for k-means at:
> https://github.com/apache/incubator-madlib/tree/master/
> src/ports/postgres/modules/kmeans/test
>
> I am sure others will pitch in to help you more with your other questions,
> but these are some starters you can consider! Good luck!
>
> NJ
>
> On Mon, Nov 14, 2016 at 10:41 PM, Kazmi,Auon H <[email protected]> wrote:
>
> > Hi,
> >
> > I am a first year Computer Science graduate student at University of
> > Florida working on implementing KNN in Madlib. I am ready with a first
> > version of it but I don't know how to proceed with testing and adding it
> to
> > Madlib platform. Also, I am not clear on what standards do I have to
> choose
> > in the final implementation. My current version asks for the table name
> and
> > column name having vectors in which I have to find the neighbours. The
> > other table given as input holds the vector whose K-NN needs to be found.
> > It is assuming euclidean distance metric for distance calculation. It
> would
> > really help if somebody can share ideas on what can be added to this
> > functionality.
> >
> >
> >
> >
> >
> > Regards,
> >
> > Auon Haidar Kazmi
> >
>