if you want to contribute to Mahout, obviously you want to speak to Mahout dev audience. Spark is not yet officially integrated into Mahout, but we are actively contemplating it and I have been doing some work off SVN e.g. https://issues.apache.org/jira/browse/MAHOUT-1346, https://issues.apache.org/jira/browse/MAHOUT-1365 and some other algorithm ports.
On Tue, Jan 7, 2014 at 1:30 PM, Oleksandr Olgashko <[email protected] > wrote: > Didn't work with Spark before (just read their overview page). > Should i ask arising questions here or better switch to Spark's mailing > lists? > > > 2014/1/7 Sebastian Schelter <[email protected]> > > > IIRC that papers talks about MapReduce on a shared-memory system, not on > > a shared-nothing system such as the Hadoop implementation. > > > > As a rule of thumb, iterations in Hadoop are about 10x slower than in > > systems such as Giraph, Spark or Stratosphere. > > > > --sebastian > > > > On 07.01.2014 22:01, Oleksandr Olgashko wrote: > > > What can you say about > > > > > > http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf > > ? > > > > > > > > > 2014/1/7 Dmitriy Lyubimov <[email protected]> > > > > > >> yes. Create working notes how exactly to do that. (Or, what i am a > bit > > >> pushing you towards, Spark, since MR is not really iteration friendly > > >> platform and it looks like iterations are needed in fastICA.). > > >> > > >> > > >> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko < > > >> [email protected]> wrote: > > >> > > >>> So the problem is to adapt ICA for MR, am i right? > > >>> > > >>> > > >>> > > >>> 2014/1/7 Dmitriy Lyubimov <[email protected]> > > >>> > > >>>> i already looked at fast ICA. while it claims to be parallel, this > > work > > >>>> doesn't exactly map it into map reduce (or spark) paradigm and from > > >> what > > >>> i > > >>>> can recollect still implies outer iterations for fitting principal > > >>>> component vectors one by one. Which means it probably already is > > >>>> MR-unfriendly by construction; Spark may show far better promise > here > > >> but > > >>>> still a working notes document is required to show how exactly. > that's > > >>> what > > >>>> i mean. > > >>>> > > >>>> > > >>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko < > > >>>> [email protected] > > >>>>> wrote: > > >>>> > > >>>>> Could you please take a look on this article? > > >>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf > > >>>>> I have learned that re-inventing the wheel is wrong for most > > >> problems, > > >>>> and > > >>>>> usually exists a better solution. However, it often needs some > > >>>> "grinding", > > >>>>> so I may research those ways, in case of approval. > > >>>>> > > >>>>> About Scala: unfortunately, I have never worked with this language > > >>>> before, > > >>>>> but wanted to. I'd like to fill that gap in my skills, but I don't > > >> know > > >>>>> exactly where to start. > > >>>>> > > >>>>> > > >>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> > > >>>>> > > >>>>>> ICA is a very useful technique for dimensionality reduction. I > > >>> believe > > >>>>>> Mahout would benefit from it; however challenges are fairly > > >>> significant > > >>>>> in > > >>>>>> terms of proven parallelization technique and acceptable efficacy, > > >>>> which > > >>>>>> makes it hard to just "implement" (I am not familiar at this point > > >>> with > > >>>>> any > > >>>>>> concrete work on parallel ICA). So like i said before i am not > very > > >>>>>> hopeful. However, if one never tries, then nothing will get ever > > >>> done. > > >>>>> who > > >>>>>> knows. > > >>>>>> > > >>>>>> > > >>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm < > > >>> [email protected] > > >>>>>>> wrote: > > >>>>>> > > >>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko > > >> wrote: > > >>>>>>>> Returning back to question about theme to work, asked 2 months > > >>> ago. > > >>>>>>>> What algorithm should I implement? > > >>>>>>> > > >>>>>>> To be quite frank with you: None. Personally I'd rather see > > >>>>> improvements > > >>>>>>> (in terms of documentation, integration, stableisation, > > >> performance > > >>>>>>> optimisation) of the existing Mahout source. > > >>>>>>> > > >>>>>>> Feel free to take a closer look at the thread concerning "getting > > >>>>>>> involved" that we had around Christmas last year for inspiration. > > >>>>>>> > > >>>>>>> > > >>>>>>> Isabel > > >>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > >> > > > > > > > >
