(1) Yes, making spark shell work with spark 1.3+ on 0.11-snapshot would be an awesome help. (2) I was thinking, if you are still into math problem, we have, in my view, a problem in CholeskyDecomposition.
This needs a little research. This involves methods solveRight, solveLeft. (2a) solveLeft claims to do forward substitution (which it does), and solveRight claims to do back substitution, which it probably does too. But in reality it solves a different problem it is supposed to. In classic scheme of things, if AX=B is positive (semi)definite, and A=LL' Cholesky decomposition, then forward substitution is supposed to solve LY=B for Y and back substitution is supposed to solve L'X=Y, i.e. back substitution is supposed to compute result of L'^-1Y. But current implementation does something that can be shown to be essentially equivalent to solveLeft() rather than solution for L'X=Y. This needs to be looked at more carefully (2b) I also believe the whole names ofr solveLeft, solveRight are misleading. In all other cases, solve() methods traditionally denote solution of AX=B or XA=B for X. In Cholesky, neither of these methods actually provides a solution for AX=B, but rather provides a part of the solution. Therefore, i think, these methods should be renamed to something like forwardSubs(), backSubs(), or better yet, name exactly what they are doing, e.g. computeLtInvZ(mxZ:Matrix). more over, it is probably beneficial to have solve methods that actually do compute full solution of Ax=b or xA = b' by combining forward and back substitutions properly. I hope some of this fits, it takes time to write this. -Dmitriy On Tue, Jun 16, 2015 at 4:17 AM, Rohit Shinde <[email protected]> wrote: > Okay, it seems that methodology is a bit too advanced for me. I would go > with framework/engineering tasks. So should I start with fixing the mahout > spark shell? > > On Tue, Jun 16, 2015 at 11:20 AM, Dmitriy Lyubimov <[email protected]> > wrote: > > > As i said, in methodology you can pick _anything_ that you think has > merit > > and not yet in the roadmap or done. > > > > For example, do you feel like you might research PSVM or interior point > > SVM? Actually, any flavor of non-linear SVM that is different from a > simple > > hinge loss? > > Do you think you can fit it in our algebraic engine? > > > > I think we also need a fair amount of port of MR methods -- like > seq2sparse > > and cvb0 lda. > > > > i would still look at framework performance tasks, they are badly needed. > > Just today listened about flyby matrix multiplication approach for spark > > for medium-sized matrices which probably beats our since even though we > do > > not use cartesian (god forbid), our implementation is somewhat closer to > > what the speaker described as "massively mapside join" -- which > eventually, > > according to him, is supposed to gain over flyby multiply, but there's a > > fair amount of tasks when it is not . > > > > similarly bolting on hardware libraries for in-core operations is still a > > big undecided issue. > > > > unfortunately a lot of known outstanding issues are still about > > engineering. > > > > > > On Mon, Jun 15, 2015 at 10:27 PM, Rohit Shinde < > > [email protected]> > > wrote: > > > > > I would prefer some methodology work if it falls within my > capabilities. > > If > > > it doesn't then your suggestion is a good one and I'll take it up. > > > Substantial according to me means a task where I can get quite familiar > > > with as much of the code base as possible. > > > > > > On Tue, Jun 16, 2015 at 10:49 AM, Dmitriy Lyubimov <[email protected]> > > > wrote: > > > > > > > I gave you 3 types of problems. Define substantial. > > > > > > > > Say, does fixing mahout spark shell sound substantial enough? > > > > > > > > On Mon, Jun 15, 2015 at 10:11 PM, Rohit Shinde < > > > > [email protected]> > > > > wrote: > > > > > > > > > So do you have any suggestions for getting started? I would like to > > > > > contribute to something substantial that is going on, after getting > > > > > familiar with the required part of the codebase. > > > > > > > > > > On Mon, Jun 15, 2015 at 11:39 PM, Dmitriy Lyubimov < > > [email protected]> > > > > > wrote: > > > > > > > > > > > i don't think there's a formal list published anywhere. > > > > > > > > > > > > There is an informal roadmap. > > > > > > > > > > > > The contributions are, the way i see it, mainly can be in 3 > areas: > > > (1) > > > > > > project support issues like for example fixing shell > compatibility > > > with > > > > > > spark 1.3; (2) framework support problems like for example > > > performance > > > > > and > > > > > > integrating 3rd party hardware accelerated linalg libraries; (3) > > > > > > methodology work. > > > > > > > > > > > > We have some pending items for (1) and (2) i think but for > > > methodology > > > > > > items (3) we simply can't compile the list of everything that can > > > > > possibly > > > > > > be done and contriubted. We just don't have that much expertise, > > > > > combined. > > > > > > No one has [1]. The way it works is usually people would come up > > with > > > > > > pieces that they were missing on their own for some reason; and > > they > > > > need > > > > > > to propose methodology, parallelization strategy, maybe even a > code > > > > > sketch > > > > > > -- that all will be fine. > > > > > > > > > > > > [1] http://matt.might.net/articles/phd-school-in-pictures/ > > > > > > > > > > > > On Sun, Jun 14, 2015 at 11:49 PM, Rohit Shinde < > > > > > > [email protected]> > > > > > > wrote: > > > > > > > > > > > > > But is there a list of projects that new people could take up? > > > Even I > > > > > am > > > > > > a > > > > > > > student interested in contributing to the machine learning and > > data > > > > > > mining > > > > > > > parts of Apache Mahout. > > > > > > > > > > > > > > I am familiar with Scala and Java, Python and C++. > > > > > > > > > > > > > > What can I contribute to? > > > > > > > > > > > > > > On Mon, Jun 15, 2015 at 10:24 AM, Dmitriy Lyubimov < > > > > [email protected]> > > > > > > > wrote: > > > > > > > > > > > > > > > Well we are predominantly Scala shop now. Being fluent in > Scala > > > > seems > > > > > > > like > > > > > > > > one prerequisite. > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Jun 13, 2015 at 1:17 AM, Sreenivas Raghavan < > > > > > > > > [email protected]> wrote: > > > > > > > > > > > > > > > > > Hello everyone, > > > > > > > > > I am interested in contributing to mahout > > > > > project. > > > > > > I > > > > > > > am > > > > > > > > > interested in algorithms, machine learning and linear > > algebra. > > > > > Please > > > > > > > > give > > > > > > > > > me some idea as where to start and how to start. I know > > python > > > > and > > > > > > some > > > > > > > > > parts of Java, so please tell me is this knowledge of > > languages > > > > > > enough > > > > > > > > for > > > > > > > > > writing and optimizing codes > > > > > > > > > -- > > > > > > > > > > > > > > > > > > *With Regards,* > > > > > > > > > *K.S.Sreenivasa Raghavan* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
