Okay, it seems that methodology is a bit too advanced for me. I would go with framework/engineering tasks. So should I start with fixing the mahout spark shell?
On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <[email protected]> wrote: > As i said, in methodology you can pick _anything_ that you think has merit > and not yet in the roadmap or done. > > For example, do you feel like you might research PSVM or interior point > SVM? Actually, any flavor of non-linear SVM that is different from a simple > hinge loss? > Do you think you can fit it in our algebraic engine? > > I think we also need a fair amount of port of MR methods -- like seq2sparse > and cvb0 lda. > > i would still look at framework performance tasks, they are badly needed. > Just today listened about flyby matrix multiplication approach for spark > for medium-sized matrices which probably beats our since even though we do > not use cartesian (god forbid), our implementation is somewhat closer to > what the speaker described as "massively mapside join" -- which eventually, > according to him, is supposed to gain over flyby multiply, but there's a > fair amount of tasks when it is not . > > similarly bolting on hardware libraries for in-core operations is still a > big undecided issue. > > unfortunately a lot of known outstanding issues are still about > engineering. > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde < > [email protected]> > wrote: > > > I would prefer some methodology work if it falls within my capabilities. > If > > it doesn't then your suggestion is a good one and I'll take it up. > > Substantial according to me means a task where I can get quite familiar > > with as much of the code base as possible. > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <[email protected]> > > wrote: > > > > > I gave you 3 types of problems. Define substantial. > > > > > > Say, does fixing mahout spark shell sound substantial enough? > > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde < > > > [email protected]> > > > wrote: > > > > > > > So do you have any suggestions for getting started? I would like to > > > > contribute to something substantial that is going on, after getting > > > > familiar with the required part of the codebase. > > > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov < > [email protected]> > > > > wrote: > > > > > > > > > i don't think there's a formal list published anywhere. > > > > > > > > > > There is an informal roadmap. > > > > > > > > > > The contributions are, the way i see it, mainly can be in 3 areas: > > (1) > > > > > project support issues like for example fixing shell compatibility > > with > > > > > spark 1.3; (2) framework support problems like for example > > performance > > > > and > > > > > integrating 3rd party hardware accelerated linalg libraries; (3) > > > > > methodology work. > > > > > > > > > > We have some pending items for (1) and (2) i think but for > > methodology > > > > > items (3) we simply can't compile the list of everything that can > > > > possibly > > > > > be done and contriubted. We just don't have that much expertise, > > > > combined. > > > > > No one has [1]. The way it works is usually people would come up > with > > > > > pieces that they were missing on their own for some reason; and > they > > > need > > > > > to propose methodology, parallelization strategy, maybe even a code > > > > sketch > > > > > -- that all will be fine. > > > > > > > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/ > > > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde < > > > > > [email protected]> > > > > > wrote: > > > > > > > > > > > But is there a list of projects that new people could take up? > > Even I > > > > am > > > > > a > > > > > > student interested in contributing to the machine learning and > data > > > > > mining > > > > > > parts of Apache Mahout. > > > > > > > > > > > > I am familiar with Scala and Java, Python and C++. > > > > > > > > > > > > What can I contribute to? > > > > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov < > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > Well we are predominantly Scala shop now. Being fluent in Scala > > > seems > > > > > > like > > > > > > > one prerequisite. > > > > > > > > > > > > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan < > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > I am interested in contributing to mahout > > > > project. > > > > > I > > > > > > am > > > > > > > > interested in algorithms, machine learning and linear > algebra. > > > > Please > > > > > > > give > > > > > > > > me some idea as where to start and how to start. I know > python > > > and > > > > > some > > > > > > > > parts of Java, so please tell me is this knowledge of > languages > > > > > enough > > > > > > > for > > > > > > > > writing and optimizing codes > > > > > > > > -- > > > > > > > > > > > > > > > > *With Regards,* > > > > > > > > *K.S.Sreenivasa Raghavan* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
