Awesome!
On Mon, Mar 28, 2016 at 9:18 PM, Frank McQuillan <[email protected]> wrote: > Thanks Roman. I was able to do it just now. > > Frank > > On Mon, Mar 28, 2016 at 9:12 PM, Roman Shaposhnik <[email protected]> wrote: >> >> I can help with that -- stay tuned. >> >> On Mon, Mar 28, 2016 at 8:29 PM, Frank McQuillan <[email protected]> >> wrote: >> > Let me figure out how to do this and add Aditya as the owner of that >> > JIRA. >> > My initial attempts in ASF infra-land were not quite successful. >> > >> > Frank >> > >> > On Mon, Mar 28, 2016 at 4:54 PM, Rahul Iyer <[email protected]> wrote: >> >> >> >> @Frank, Roman: I believe Aditya needs to be added as a developer to the >> >> MADlib project to assign a JIRA to him? Is this only available to the >> >> lead/owner? >> >> >> >> On Mon, Mar 28, 2016 at 3:49 PM, Aditya Nain <[email protected]> >> >> wrote: >> >>> >> >>> Hi Rahul, >> >>> >> >>> I didn't have an id, so I created one now. >> >>> My id is : Aditya Nain >> >>> >> >>> Thanks, >> >>> Aditya >> >>> >> >>> On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer <[email protected]> wrote: >> >>> >> >>> > I can assign this to you, but you need to have an account in >> >>> > https://issues.apache.org. >> >>> > If you already have an account, then please send your id - I wasn't >> >>> > able to >> >>> > find you just using your name. >> >>> > >> >>> > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain <[email protected]> >> >>> > wrote: >> >>> > >> >>> > > Hi Rahul, >> >>> > > >> >>> > > Thanks for the reply! >> >>> > > >> >>> > > I am working on implementing Gaussian Mixture Model assuming that >> >>> > > the >> >>> > > co-variance matrix is same for all the Gaussians. >> >>> > > The JIRA which deals GMM is MADBLIB-410: >> >>> > > >> >>> > >> >>> > >> >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB >> >>> > > >> >>> > > Can this be assigned to me, or how do I get it assigned to me? >> >>> > > >> >>> > > Thanks, >> >>> > > Aditya >> >>> > > >> >>> > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <[email protected]> >> >>> > > wrote: >> >>> > > >> >>> > > > Hi Aditya, >> >>> > > > >> >>> > > > Welcome to the MADlib community! >> >>> > > > >> >>> > > > Gaussian Mixture models is extrememly useful and we would >> >>> > > > heartily >> >>> > > welcome >> >>> > > > a contribution for it. The SQLEM paper might be oversimplifying >> >>> > > > the >> >>> > > > capabilities of the database (e.g. assuming there is no array >> >>> > > > type >> >>> > > > is >> >>> > > > unnecessary for Postgresql). You could speed things (both dev >> >>> > > > time >> >>> > > > and >> >>> > > > execution time) by writing some of the functions in C++. K-means >> >>> > > > is >> >>> > > > an >> >>> > > > example of how clustering is implemented. >> >>> > > > IMO, assuming the same covariance matrix is reasonable. We could >> >>> > > > extend >> >>> > > the >> >>> > > > capabilities after the initial implementation is complete. >> >>> > > > >> >>> > > > There was some work started a long time ago that built >> >>> > > > perceptrons >> >>> > using >> >>> > > > the convex framework (link >> >>> > > > <https://github.com/iyerr3/madlib/tree/mlp >> >>> > >). >> >>> > > > There are still some bugs in that code since the trained network >> >>> > > > isn't >> >>> > > > converging. You could start there or build a new module - either >> >>> > > > ways >> >>> > an >> >>> > > > MLP module is frequently demanded by the data science community. >> >>> > > > >> >>> > > > I would suggest starting with Gaussian mixtures and then moving >> >>> > > > to >> >>> > > > perceptrons if GMM work is completed. >> >>> > > > >> >>> > > > Feel free to ask questions on this forum. Looking forward to >> >>> > > collaborating >> >>> > > > with you. >> >>> > > > >> >>> > > > Best, >> >>> > > > Rahul >> >>> > > > >> >>> > > > On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain >> >>> > > > <[email protected]> >> >>> > > > wrote: >> >>> > > > >> >>> > > > > Hi, >> >>> > > > > >> >>> > > > > My name is Aditya Nain, and I am a graduate student at >> >>> > > > > University >> >>> > > > > of >> >>> > > > > Florida. >> >>> > > > > I have been learning MADLib for a while and want to contribute >> >>> > > > > to >> >>> > > MADLib. >> >>> > > > > I went through some of the open stories in JIRA and started >> >>> > > > > working >> >>> > on >> >>> > > > > MADLIB-410 : >> >>> > > > > >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >>> > >> >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB >> >>> > > > > >> >>> > > > > which is about implementing Gaussian Mixture Model using >> >>> > > > > Expectation >> >>> > > > > Maximization (EM) algorithm. >> >>> > > > > >> >>> > > > > I came across the following paper while searching for >> >>> > > > > distributed >> >>> > > > > EM >> >>> > > > > algorithm which can be implemented in MADLib. >> >>> > > > > >> >>> > > > > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL >> >>> > > > > using >> >>> > the >> >>> > > > EM >> >>> > > > > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 >> >>> > > > > Pages >> >>> > > 559-570. >> >>> > > > > >> >>> > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564 >> >>> > > > > >> >>> > > > > I thought of implementing the approach discussed in the paper, >> >>> > > > > but >> >>> > the >> >>> > > > > paper makes an assumption that the covariance martix is the >> >>> > > > > same >> >>> > > > > for >> >>> > > all >> >>> > > > > the clusters ( i.e covariance matrix is same for all the >> >>> > > > > Gaussian >> >>> > > > > distributions). So, I wanted to know the opinion of the >> >>> > > > > community >> >>> > > > > if >> >>> > > it's >> >>> > > > > fine to go with the assumption made in the paper and implement >> >>> > > > > it >> >>> > > > > in >> >>> > > > > MADLib. >> >>> > > > > >> >>> > > > > Also, currently MADLib doesn't have an implementation of a >> >>> > perceptron, >> >>> > > > nor >> >>> > > > > did I find any open story related to it in JIRA. I came across >> >>> > > > > the >> >>> > > > > following paper, which talks about a distributed algorithm for >> >>> > > > perceptron : >> >>> > > > > >> >>> > > > > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training >> >>> > strategies >> >>> > > > for >> >>> > > > > the structured perceptron" >> >>> > > > > http://dl.acm.org/citation.cfm?id=1858068 >> >>> > > > > >> >>> > > > > Would it useful to have a distributed implementaion of >> >>> > > > > perceptron >> >>> > > > > in >> >>> > > > > MADlib? >> >>> > > > > >> >>> > > > > Thanks, >> >>> > > > > Aditya >> >>> > > > > >> >>> > > > >> >>> > > >> >>> > >> >> >> >> >> > > >
