I can help with that -- stay tuned.
On Mon, Mar 28, 2016 at 8:29 PM, Frank McQuillan <[email protected]> wrote: > Let me figure out how to do this and add Aditya as the owner of that JIRA. > My initial attempts in ASF infra-land were not quite successful. > > Frank > > On Mon, Mar 28, 2016 at 4:54 PM, Rahul Iyer <[email protected]> wrote: >> >> @Frank, Roman: I believe Aditya needs to be added as a developer to the >> MADlib project to assign a JIRA to him? Is this only available to the >> lead/owner? >> >> On Mon, Mar 28, 2016 at 3:49 PM, Aditya Nain <[email protected]> >> wrote: >>> >>> Hi Rahul, >>> >>> I didn't have an id, so I created one now. >>> My id is : Aditya Nain >>> >>> Thanks, >>> Aditya >>> >>> On Mon, Mar 28, 2016 at 6:40 PM, Rahul Iyer <[email protected]> wrote: >>> >>> > I can assign this to you, but you need to have an account in >>> > https://issues.apache.org. >>> > If you already have an account, then please send your id - I wasn't >>> > able to >>> > find you just using your name. >>> > >>> > On Mon, Mar 28, 2016 at 3:31 PM, Aditya Nain <[email protected]> >>> > wrote: >>> > >>> > > Hi Rahul, >>> > > >>> > > Thanks for the reply! >>> > > >>> > > I am working on implementing Gaussian Mixture Model assuming that the >>> > > co-variance matrix is same for all the Gaussians. >>> > > The JIRA which deals GMM is MADBLIB-410: >>> > > >>> > >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB >>> > > >>> > > Can this be assigned to me, or how do I get it assigned to me? >>> > > >>> > > Thanks, >>> > > Aditya >>> > > >>> > > On Mon, Mar 21, 2016 at 3:41 PM, Rahul Iyer <[email protected]> wrote: >>> > > >>> > > > Hi Aditya, >>> > > > >>> > > > Welcome to the MADlib community! >>> > > > >>> > > > Gaussian Mixture models is extrememly useful and we would heartily >>> > > welcome >>> > > > a contribution for it. The SQLEM paper might be oversimplifying the >>> > > > capabilities of the database (e.g. assuming there is no array type >>> > > > is >>> > > > unnecessary for Postgresql). You could speed things (both dev time >>> > > > and >>> > > > execution time) by writing some of the functions in C++. K-means is >>> > > > an >>> > > > example of how clustering is implemented. >>> > > > IMO, assuming the same covariance matrix is reasonable. We could >>> > > > extend >>> > > the >>> > > > capabilities after the initial implementation is complete. >>> > > > >>> > > > There was some work started a long time ago that built perceptrons >>> > using >>> > > > the convex framework (link >>> > > > <https://github.com/iyerr3/madlib/tree/mlp >>> > >). >>> > > > There are still some bugs in that code since the trained network >>> > > > isn't >>> > > > converging. You could start there or build a new module - either >>> > > > ways >>> > an >>> > > > MLP module is frequently demanded by the data science community. >>> > > > >>> > > > I would suggest starting with Gaussian mixtures and then moving to >>> > > > perceptrons if GMM work is completed. >>> > > > >>> > > > Feel free to ask questions on this forum. Looking forward to >>> > > collaborating >>> > > > with you. >>> > > > >>> > > > Best, >>> > > > Rahul >>> > > > >>> > > > On Thu, Mar 17, 2016 at 2:08 PM, Aditya Nain >>> > > > <[email protected]> >>> > > > wrote: >>> > > > >>> > > > > Hi, >>> > > > > >>> > > > > My name is Aditya Nain, and I am a graduate student at University >>> > > > > of >>> > > > > Florida. >>> > > > > I have been learning MADLib for a while and want to contribute to >>> > > MADLib. >>> > > > > I went through some of the open stories in JIRA and started >>> > > > > working >>> > on >>> > > > > MADLIB-410 : >>> > > > > >>> > > > > >>> > > > >>> > > >>> > >>> > https://issues.apache.org/jira/browse/MADLIB-410?jql=project%20%3D%20MADLIB >>> > > > > >>> > > > > which is about implementing Gaussian Mixture Model using >>> > > > > Expectation >>> > > > > Maximization (EM) algorithm. >>> > > > > >>> > > > > I came across the following paper while searching for distributed >>> > > > > EM >>> > > > > algorithm which can be implemented in MADLib. >>> > > > > >>> > > > > Carlos Ordonez, Paul Cereghini "SQLEM: fast clustering in SQL >>> > > > > using >>> > the >>> > > > EM >>> > > > > algorithm" ACM SIGMOD Record, Volume 29 Issue 2, June 2000 Pages >>> > > 559-570. >>> > > > > http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.28.7564 >>> > > > > >>> > > > > I thought of implementing the approach discussed in the paper, >>> > > > > but >>> > the >>> > > > > paper makes an assumption that the covariance martix is the same >>> > > > > for >>> > > all >>> > > > > the clusters ( i.e covariance matrix is same for all the Gaussian >>> > > > > distributions). So, I wanted to know the opinion of the community >>> > > > > if >>> > > it's >>> > > > > fine to go with the assumption made in the paper and implement it >>> > > > > in >>> > > > > MADLib. >>> > > > > >>> > > > > Also, currently MADLib doesn't have an implementation of a >>> > perceptron, >>> > > > nor >>> > > > > did I find any open story related to it in JIRA. I came across >>> > > > > the >>> > > > > following paper, which talks about a distributed algorithm for >>> > > > perceptron : >>> > > > > >>> > > > > Ryan McDonald, Keith Hall, Gideon Mann "Distributed training >>> > strategies >>> > > > for >>> > > > > the structured perceptron" >>> > > > > http://dl.acm.org/citation.cfm?id=1858068 >>> > > > > >>> > > > > Would it useful to have a distributed implementaion of perceptron >>> > > > > in >>> > > > > MADlib? >>> > > > > >>> > > > > Thanks, >>> > > > > Aditya >>> > > > > >>> > > > >>> > > >>> > >> >> >
