I think it is certainly ok for you to try and your thoughts are even more
appreciated because optimization of this stuff for big data that is also
accurate seem to take more than one head to review.

However, I've already planned on doing 817 in the next two months and
finish it in Q1 if I can work out existing issues.
The existing issues are both flow and performance and IMO require a tad
more contemplation w.r.t. to existing flow pecularities before reliable
flow could be figured.
On top of it, at the point I am primary maintainer of SSVD code and I think
you should know that introducing modifications which at this point seem
fairly sizable may make it more difficult for me to maintain it --
especially given we haven't considered effect on existing power iterations
yet and future issue of introducing Cholesky option (there's a  pending
issue for that as well). But I think you can catalyze that process, you
already did a lot.


On Mon, Nov 28, 2011 at 12:32 AM, Raphael Cendrillon <
[email protected]> wrote:

> Hi Dmitriy,
>
> If it's OK with you I'd like to try implementing this decoration.
>
> Any advice or guidance would be very much appreciated.
>
> Raphael.
>
> On 27 Nov, 2011, at 9:23 AM, Dmitriy Lyubimov (Commented) (JIRA) wrote:
>
> > Dmitriy Lyubimov commented on MAHOUT-817:
> > -----------------------------------------
> >
> > For the column mean bruteforce approach is probably the simplest, we 'd
> have to decorate input of A with mean subtraction.
> >
> >> Add PCA options to SSVD code
> >> ----------------------------
> >>
> >>                Key: MAHOUT-817
> >>                URL: https://issues.apache.org/jira/browse/MAHOUT-817
> >>            Project: Mahout
> >>         Issue Type: New Feature
> >>   Affects Versions: 0.6
> >>           Reporter: Dmitriy Lyubimov
> >>           Assignee: Dmitriy Lyubimov
> >>            Fix For: Backlog
> >>
> >>
> >> It seems that a simple solution should exist to integrate PCA mean
> subtraction into SSVD algorithm without making it a pre-requisite step and
> also avoiding densifying the big input.
> >> Several approaches were suggested:
> >> 1) subtract mean off B
> >> 2) propagate mean vector deeper into algorithm algebraically where the
> data is already collapsed to smaller matrices
> >> 3) --?
> >> It needs some math done first . I'll take a stab at 1 and 2 but
> thoughts and math are welcome.
> >
> > --
> > This message is automatically generated by JIRA.
> > If you think it was sent incorrectly, please contact your JIRA
> administrators:
> https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
> > For more information on JIRA, see:
> http://www.atlassian.com/software/jira
> >
> >
>
>

Reply via email to