Re: Implementing ICA

Sebastian Schelter Tue, 07 Jan 2014 13:17:30 -0800

IIRC that papers talks about MapReduce on a shared-memory system, not on
a shared-nothing system such as the Hadoop implementation.


As a rule of thumb, iterations in Hadoop are about 10x slower than in
systems such as Giraph, Spark or Stratosphere.

--sebastian

On 07.01.2014 22:01, Oleksandr Olgashko wrote:
> What can you say about
> http://www.cs.stanford.edu/people/ang//papers/nips06-mapreducemulticore.pdf?
> 
> 
> 2014/1/7 Dmitriy Lyubimov <[email protected]>
> 
>> yes. Create working notes how exactly to do that.  (Or, what i am a bit
>> pushing you towards, Spark, since MR is not really iteration friendly
>> platform and it looks like iterations are needed in fastICA.).
>>
>>
>> On Tue, Jan 7, 2014 at 12:38 PM, Oleksandr Olgashko <
>> [email protected]> wrote:
>>
>>> So the problem is to adapt ICA for MR, am i right?
>>>
>>>
>>>
>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
>>>
>>>> i already looked at fast ICA. while it claims to be parallel, this work
>>>> doesn't exactly map it into map reduce (or spark) paradigm and from
>> what
>>> i
>>>> can recollect still implies outer iterations for fitting principal
>>>> component vectors one by one. Which means it probably already is
>>>> MR-unfriendly by construction; Spark may show far better promise here
>> but
>>>> still a working notes document is required to show how exactly. that's
>>> what
>>>> i mean.
>>>>
>>>>
>>>> On Tue, Jan 7, 2014 at 1:35 AM, Oleksandr Olgashko <
>>>> [email protected]
>>>>> wrote:
>>>>
>>>>> Could you please take a look on this article?
>>>>> http://cran.r-project.org/web/packages/fastICA/fastICA.pdf
>>>>> I have learned that re-inventing the wheel is wrong for most
>> problems,
>>>> and
>>>>> usually exists a better solution. However, it often needs some
>>>> "grinding",
>>>>> so I may research those ways, in case of approval.
>>>>>
>>>>> About Scala: unfortunately, I have never worked with this language
>>>> before,
>>>>> but wanted to. I'd like to fill that gap in my skills, but I don't
>> know
>>>>> exactly where to start.
>>>>>
>>>>>
>>>>> 2014/1/7 Dmitriy Lyubimov <[email protected]>
>>>>>
>>>>>> ICA is a very useful technique for dimensionality reduction. I
>>> believe
>>>>>> Mahout would benefit from it; however challenges are fairly
>>> significant
>>>>> in
>>>>>> terms of proven parallelization technique and acceptable efficacy,
>>>> which
>>>>>> makes it hard to just "implement" (I am not familiar at this point
>>> with
>>>>> any
>>>>>> concrete work on parallel ICA). So like i said before i am not very
>>>>>> hopeful. However, if one never tries, then nothing will get ever
>>> done.
>>>>> who
>>>>>> knows.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 6, 2014 at 2:18 PM, Isabel Drost-Fromm <
>>> [email protected]
>>>>>>> wrote:
>>>>>>
>>>>>>> On Mon, Jan 06, 2014 at 10:40:45PM +0200, Oleksandr Olgashko
>> wrote:
>>>>>>>> Returning back to question about theme to work, asked 2 months
>>> ago.
>>>>>>>> What algorithm should I implement?
>>>>>>>
>>>>>>> To be quite frank with you: None. Personally I'd rather see
>>>>> improvements
>>>>>>> (in terms of documentation, integration, stableisation,
>> performance
>>>>>>> optimisation) of the existing Mahout source.
>>>>>>>
>>>>>>> Feel free to take a closer look at the thread concerning "getting
>>>>>>> involved" that we had around Christmas last year for inspiration.
>>>>>>>
>>>>>>>
>>>>>>> Isabel
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Implementing ICA

Reply via email to