I think it is certainly ok for you to try and your thoughts are even more appreciated because optimization of this stuff for big data that is also accurate seem to take more than one head to review.
However, I've already planned on doing 817 in the next two months and finish it in Q1 if I can work out existing issues. The existing issues are both flow and performance and IMO require a tad more contemplation w.r.t. to existing flow pecularities before reliable flow could be figured. On top of it, at the point I am primary maintainer of SSVD code and I think you should know that introducing modifications which at this point seem fairly sizable may make it more difficult for me to maintain it -- especially given we haven't considered effect on existing power iterations yet and future issue of introducing Cholesky option (there's a pending issue for that as well). But I think you can catalyze that process, you already did a lot. On Mon, Nov 28, 2011 at 12:32 AM, Raphael Cendrillon < [email protected]> wrote: > Hi Dmitriy, > > If it's OK with you I'd like to try implementing this decoration. > > Any advice or guidance would be very much appreciated. > > Raphael. > > On 27 Nov, 2011, at 9:23 AM, Dmitriy Lyubimov (Commented) (JIRA) wrote: > > > Dmitriy Lyubimov commented on MAHOUT-817: > > ----------------------------------------- > > > > For the column mean bruteforce approach is probably the simplest, we 'd > have to decorate input of A with mean subtraction. > > > >> Add PCA options to SSVD code > >> ---------------------------- > >> > >> Key: MAHOUT-817 > >> URL: https://issues.apache.org/jira/browse/MAHOUT-817 > >> Project: Mahout > >> Issue Type: New Feature > >> Affects Versions: 0.6 > >> Reporter: Dmitriy Lyubimov > >> Assignee: Dmitriy Lyubimov > >> Fix For: Backlog > >> > >> > >> It seems that a simple solution should exist to integrate PCA mean > subtraction into SSVD algorithm without making it a pre-requisite step and > also avoiding densifying the big input. > >> Several approaches were suggested: > >> 1) subtract mean off B > >> 2) propagate mean vector deeper into algorithm algebraically where the > data is already collapsed to smaller matrices > >> 3) --? > >> It needs some math done first . I'll take a stab at 1 and 2 but > thoughts and math are welcome. > > > > -- > > This message is automatically generated by JIRA. > > If you think it was sent incorrectly, please contact your JIRA > administrators: > https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa > > For more information on JIRA, see: > http://www.atlassian.com/software/jira > > > > > >
