@Sebastian, wanna post a link?
On Tue, Jan 7, 2014 at 2:46 PM, Sebastian Schelter <[email protected]> wrote: > I also have some spark cooccurrence analysis code lying around that > might be a nice contribution. > > On 07.01.2014 23:44, Dmitriy Lyubimov wrote: > > if you want to contribute to Mahout, obviously you want to speak to > Mahout > > dev audience. Spark is not yet officially integrated into Mahout, but we > > are actively contemplating it and I have been doing some work off SVN > e.g. > > https://issues.apache.org/jira/browse/MAHOUT-1346, > > https://issues.apache.org/jira/browse/MAHOUT-1365 and some other > algorithm > > ports. > > > > > > On Tue, Jan 7, 2014 at 1:30 PM, Oleksandr Olgashko < > [email protected] > >> wrote: > > > >> Didn't work with Spark before (just read their overview page). > >> Should i ask arising questions here or better switch to Spark's mailing > >> lists? > >> > >> > >> 2014/1/7 Sebastian Schelter <[email protected]> > >> > >>> IIRC that papers talks about MapReduce on a shared-memory system, not > on > >>> a shared-nothing system such as the Hadoop implementation. > >>> > >>> As a rule of thumb, iterations in Hadoop are about 10x slower than in > >>> systems such as Giraph, Spark or Stratosphere. > >>> > >>> --sebastian > >>> > >>> On 07.01.2014 22:01, Oleksandr Olgashko wrote: > >>>> What can you say about > >>>> > >>> > >> > http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf > >>> ? > >>>> > >>>> > >>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> > >>>> > >>>>> yes. Create working notes how exactly to do that. (Or, what i am a > >> bit > >>>>> pushing you towards, Spark, since MR is not really iteration friendly > >>>>> platform and it looks like iterations are needed in fastICA.). > >>>>> > >>>>> > >>>>> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> So the problem is to adapt ICA for MR, am i right? > >>>>>> > >>>>>> > >>>>>> > >>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> > >>>>>> > >>>>>>> i already looked at fast ICA. while it claims to be parallel, this > >>> work > >>>>>>> doesn't exactly map it into map reduce (or spark) paradigm and from > >>>>> what > >>>>>> i > >>>>>>> can recollect still implies outer iterations for fitting principal > >>>>>>> component vectors one by one. Which means it probably already is > >>>>>>> MR-unfriendly by construction; Spark may show far better promise > >> here > >>>>> but > >>>>>>> still a working notes document is required to show how exactly. > >> that's > >>>>>> what > >>>>>>> i mean. > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko < > >>>>>>> [email protected] > >>>>>>>> wrote: > >>>>>>> > >>>>>>>> Could you please take a look on this article? > >>>>>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf > >>>>>>>> I have learned that re-inventing the wheel is wrong for most > >>>>> problems, > >>>>>>> and > >>>>>>>> usually exists a better solution. However, it often needs some > >>>>>>> "grinding", > >>>>>>>> so I may research those ways, in case of approval. > >>>>>>>> > >>>>>>>> About Scala: unfortunately, I have never worked with this language > >>>>>>> before, > >>>>>>>> but wanted to. I'd like to fill that gap in my skills, but I don't > >>>>> know > >>>>>>>> exactly where to start. > >>>>>>>> > >>>>>>>> > >>>>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]> > >>>>>>>> > >>>>>>>>> ICA is a very useful technique for dimensionality reduction. I > >>>>>> believe > >>>>>>>>> Mahout would benefit from it; however challenges are fairly > >>>>>> significant > >>>>>>>> in > >>>>>>>>> terms of proven parallelization technique and acceptable > efficacy, > >>>>>>> which > >>>>>>>>> makes it hard to just "implement" (I am not familiar at this > point > >>>>>> with > >>>>>>>> any > >>>>>>>>> concrete work on parallel ICA). So like i said before i am not > >> very > >>>>>>>>> hopeful. However, if one never tries, then nothing will get ever > >>>>>> done. > >>>>>>>> who > >>>>>>>>> knows. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm < > >>>>>> [email protected] > >>>>>>>>>> wrote: > >>>>>>>>> > >>>>>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko > >>>>> wrote: > >>>>>>>>>>> Returning back to question about theme to work, asked 2 months > >>>>>> ago. > >>>>>>>>>>> What algorithm should I implement? > >>>>>>>>>> > >>>>>>>>>> To be quite frank with you: None. Personally I'd rather see > >>>>>>>> improvements > >>>>>>>>>> (in terms of documentation, integration, stableisation, > >>>>> performance > >>>>>>>>>> optimisation) of the existing Mahout source. > >>>>>>>>>> > >>>>>>>>>> Feel free to take a closer look at the thread concerning > "getting > >>>>>>>>>> involved" that we had around Christmas last year for > inspiration. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Isabel > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >>> > >> > > > >
