Thanks for clarifying. I think we're all on the same page on this, although using different terms. I'll package up the job I currently have for this and submit a patch.
By the way, currently I have the rows being added at the combiner, and then the results of the combiners added in a single reducer. Do you think this is sufficient, or should multiple reducers be used (per column) to further spread the load? On Dec 5, 2011, at 11:38 AM, Dmitriy Lyubimov <[email protected]> wrote: > ok column-wise mean. (the mean of all rows). > > On Mon, Dec 5, 2011 at 11:00 AM, Ted Dunning <[email protected]> wrote: >> Row-wise mean usually means that a mean of each row is computed. >> >> I think that most PCA users would want column-wise means for subtraction. >> >> On Mon, Dec 5, 2011 at 10:58 AM, Dmitriy Lyubimov <[email protected]> wrote: >> >>> We probably need row wise mean computation job anyway as a separate mr >>> step. Wanna take a stab? >>> On Dec 5, 2011 10:34 AM, "Raphael Cendrillon" <[email protected]> >>> wrote: >>> >>>> Given that this request seems to come up frequently, would it be worth >>>> putting this approach under mahout-examples? Initially it could use the >>>> brute force approach together with SSVD, and updated later once support >>> is >>>> ready for mean-subtraction within SSVD. >>>> >>>> I could put something together if there's interest. >>>> >>>> On Mon, Dec 5, 2011 at 9:40 AM, Dmitriy Lyubimov <[email protected]> >>>> wrote: >>>> >>>>> I am working on the addtions to ssvd algorithms and the mods to current >>>>> solver will probably emerge in a matter of a month, my schedule >>>> permitting. >>>>> >>>>> However, a brute force approach is already possible. If your input is >>> of >>>>> moderate size, or if it is already dense, you could compute median and >>>>> substract it yourself very easily and then shove it into ssvd solver >>>> while >>>>> requesting to produce either u or v depending if subtract column wise >>> or >>>>> row wise mean. >>>>> >>>>> The only problem with brute force approach is that it would densify >>>>> originally sparse input. Depending on your problem and # of machine >>> nodes >>>>> you can spare, it may or may not be a problem. >>>>> On Dec 4, 2011 7:59 PM, "magicalo" <[email protected]> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> Is there an expected release date for the PCA algorithm as part of >>>>> Mahout? >>>>>> Tx! >>>>>> >>>>>> >>>>> >>>> >>>
