IIRC that papers talks about MapReduce on a shared-memory system, not on a shared-nothing system such as the Hadoop implementation.
As a rule of thumb, iterations in Hadoop are about 10x slower than in systems such as Giraph, Spark or Stratosphere. --sebastian On 07.01.2014 22:01, Oleksandr Olgashko wrote: > What can you say about > http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf? > > > 2014/1/7 Dmitriy Lyubimov <[email protected]> > >> yes. Create working notes how exactly to do that. (Or, what i am a bit >> pushing you towards, Spark, since MR is not really iteration friendly >> platform and it looks like iterations are needed in fastICA.). >> >> >> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko < >> [email protected]> wrote: >> >>> So the problem is to adapt ICA for MR, am i right? >>> >>> >>> >>> 2014/1/7 Dmitriy Lyubimov <[email protected]> >>> >>>> i already looked at fast ICA. while it claims to be parallel, this work >>>> doesn't exactly map it into map reduce (or spark) paradigm and from >> what >>> i >>>> can recollect still implies outer iterations for fitting principal >>>> component vectors one by one. Which means it probably already is >>>> MR-unfriendly by construction; Spark may show far better promise here >> but >>>> still a working notes document is required to show how exactly. that's >>> what >>>> i mean. >>>> >>>> >>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko < >>>> [email protected] >>>>> wrote: >>>> >>>>> Could you please take a look on this article? >>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf >>>>> I have learned that re-inventing the wheel is wrong for most >> problems, >>>> and >>>>> usually exists a better solution. However, it often needs some >>>> "grinding", >>>>> so I may research those ways, in case of approval. >>>>> >>>>> About Scala: unfortunately, I have never worked with this language >>>> before, >>>>> but wanted to. I'd like to fill that gap in my skills, but I don't >> know >>>>> exactly where to start. >>>>> >>>>> >>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> >>>>> >>>>>> ICA is a very useful technique for dimensionality reduction. I >>> believe >>>>>> Mahout would benefit from it; however challenges are fairly >>> significant >>>>> in >>>>>> terms of proven parallelization technique and acceptable efficacy, >>>> which >>>>>> makes it hard to just "implement" (I am not familiar at this point >>> with >>>>> any >>>>>> concrete work on parallel ICA). So like i said before i am not very >>>>>> hopeful. However, if one never tries, then nothing will get ever >>> done. >>>>> who >>>>>> knows. >>>>>> >>>>>> >>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm < >>> [email protected] >>>>>>> wrote: >>>>>> >>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko >> wrote: >>>>>>>> Returning back to question about theme to work, asked 2 months >>> ago. >>>>>>>> What algorithm should I implement? >>>>>>> >>>>>>> To be quite frank with you: None. Personally I'd rather see >>>>> improvements >>>>>>> (in terms of documentation, integration, stableisation, >> performance >>>>>>> optimisation) of the existing Mahout source. >>>>>>> >>>>>>> Feel free to take a closer look at the thread concerning "getting >>>>>>> involved" that we had around Christmas last year for inspiration. >>>>>>> >>>>>>> >>>>>>> Isabel >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >
