As i said, in methodology you can pick _anything_ that you think has merit and not yet in the roadmap or done.
For example, do you feel like you might research PSVM or interior point SVM? Actually, any flavor of non-linear SVM that is different from a simple hinge loss? Do you think you can fit it in our algebraic engine? I think we also need a fair amount of port of MR methods -- like seq2sparse and cvb0 lda. i would still look at framework performance tasks, they are badly needed. Just today listened about flyby matrix multiplication approach for spark for medium-sized matrices which probably beats our since even though we do not use cartesian (god forbid), our implementation is somewhat closer to what the speaker described as "massively mapside join" -- which eventually, according to him, is supposed to gain over flyby multiply, but there's a fair amount of tasks when it is not . similarly bolting on hardware libraries for in-core operations is still a big undecided issue. unfortunately a lot of known outstanding issues are still about engineering. On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde <[email protected]> wrote: > I would prefer some methodology work if it falls within my capabilities. If > it doesn't then your suggestion is a good one and I'll take it up. > Substantial according to me means a task where I can get quite familiar > with as much of the code base as possible. > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > I gave you 3 types of problems. Define substantial. > > > > Say, does fixing mahout spark shell sound substantial enough? > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde < > > [email protected]> > > wrote: > > > > > So do you have any suggestions for getting started? I would like to > > > contribute to something substantial that is going on, after getting > > > familiar with the required part of the codebase. > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > > > > i don't think there's a formal list published anywhere. > > > > > > > > There is an informal roadmap. > > > > > > > > The contributions are, the way i see it, mainly can be in 3 areas: > (1) > > > > project support issues like for example fixing shell compatibility > with > > > > spark 1.3; (2) framework support problems like for example > performance > > > and > > > > integrating 3rd party hardware accelerated linalg libraries; (3) > > > > methodology work. > > > > > > > > We have some pending items for (1) and (2) i think but for > methodology > > > > items (3) we simply can't compile the list of everything that can > > > possibly > > > > be done and contriubted. We just don't have that much expertise, > > > combined. > > > > No one has [1]. The way it works is usually people would come up with > > > > pieces that they were missing on their own for some reason; and they > > need > > > > to propose methodology, parallelization strategy, maybe even a code > > > sketch > > > > -- that all will be fine. > > > > > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/ > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde < > > > > [email protected]> > > > > wrote: > > > > > > > > > But is there a list of projects that new people could take up? > Even I > > > am > > > > a > > > > > student interested in contributing to the machine learning and data > > > > mining > > > > > parts of Apache Mahout. > > > > > > > > > > I am familiar with Scala and Java, Python and C++. > > > > > > > > > > What can I contribute to? > > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov < > > [email protected]> > > > > > wrote: > > > > > > > > > > > Well we are predominantly Scala shop now. Being fluent in Scala > > seems > > > > > like > > > > > > one prerequisite. > > > > > > > > > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan < > > > > > > [email protected]> wrote: > > > > > > > > > > > > > Hello everyone, > > > > > > > I am interested in contributing to mahout > > > project. > > > > I > > > > > am > > > > > > > interested in algorithms, machine learning and linear algebra. > > > Please > > > > > > give > > > > > > > me some idea as where to start and how to start. I know python > > and > > > > some > > > > > > > parts of Java, so please tell me is this knowledge of languages > > > > enough > > > > > > for > > > > > > > writing and optimizing codes > > > > > > > -- > > > > > > > > > > > > > > *With Regards,* > > > > > > > *K.S.Sreenivasa Raghavan* > > > > > > > > > > > > > > > > > > > > > > > > > > > >
