Re: When is PCA expected to be fully implemented into Mahout?

Raphael Cendrillon Mon, 05 Dec 2011 13:03:36 -0800

Thanks for clarifying. I think we're all on the same page on this, although 
using different terms. I'll package up the job I currently have for this and 
submit a patch.


By the way, currently I have the rows being added at the combiner, and then the 
results of the combiners added in a single reducer. Do you think this is 
sufficient, or should multiple reducers be used (per column) to further spread 
the load?

On Dec 5, 2011, at 11:38 AM, Dmitriy Lyubimov <[email protected]> wrote:

> ok column-wise mean. (the mean of all rows).
> 
> On Mon, Dec 5, 2011 at 11:00 AM, Ted Dunning <[email protected]> wrote:
>> Row-wise mean usually means that a mean of each row is computed.
>> 
>> I think that most PCA users would want column-wise means for subtraction.
>> 
>> On Mon, Dec 5, 2011 at 10:58 AM, Dmitriy Lyubimov <[email protected]> wrote:
>> 
>>> We probably need  row wise mean computation job anyway as a separate mr
>>> step. Wanna take a stab?
>>> On Dec 5, 2011 10:34 AM, "Raphael Cendrillon" <[email protected]>
>>> wrote:
>>> 
>>>> Given that this request seems to come up frequently, would it be worth
>>>> putting this approach under mahout-examples?  Initially it could use the
>>>> brute force approach together with SSVD, and updated later once support
>>> is
>>>> ready for mean-subtraction within SSVD.
>>>> 
>>>> I could put something together if there's interest.
>>>> 
>>>> On Mon, Dec 5, 2011 at 9:40 AM, Dmitriy Lyubimov <[email protected]>
>>>> wrote:
>>>> 
>>>>> I am working on the addtions to ssvd algorithms and the mods to current
>>>>> solver will probably emerge in a matter of a month, my schedule
>>>> permitting.
>>>>> 
>>>>> However, a brute force approach is already possible. If your input is
>>> of
>>>>> moderate size, or if it is already dense, you could compute median and
>>>>> substract it yourself very easily and then shove it into ssvd solver
>>>> while
>>>>> requesting to produce either u or v depending if subtract column wise
>>> or
>>>>> row wise mean.
>>>>> 
>>>>> The only problem with brute force approach is that it would densify
>>>>> originally sparse input. Depending on your problem and # of machine
>>> nodes
>>>>> you can spare, it may or may not be a problem.
>>>>> On Dec 4, 2011 7:59 PM, "magicalo" <[email protected]> wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Is there an expected release date for the PCA algorithm as part of
>>>>> Mahout?
>>>>>> Tx!
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>

Re: When is PCA expected to be fully implemented into Mahout?

Reply via email to