[
https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113007#comment-13113007
]
Dmitriy Lyubimov commented on MAHOUT-817:
-----------------------------------------
why would we want to support both row and column mean subtraction? I need to
re-read the motivation of this.
I think a lot also resides on a question if we actually also want _output_ the
mean.
And the next question is whether we want to spend one additional pass just to
find the mean. if yes, then the rest is easy. we just will be doing mean
subtraction as part of Y computation . should be ok flops-wise.
but if we think we shouldn't be waiting for mean computation as a separate
pass, and we don't want to output it either, then that's where it becomes a
little tricky.
> Add PCA options to SSVD code
> ----------------------------
>
> Key: MAHOUT-817
> URL: https://issues.apache.org/jira/browse/MAHOUT-817
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.6
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: 0.6
>
>
> It seems that a simple solution should exist to integrate PCA mean
> subtraction into SSVD algorithm without making it a pre-requisite step and
> also avoiding densifying the big input.
> Several approaches were suggested:
> 1) subtract mean off B
> 2) propagate mean vector deeper into algorithm algebraically where the data
> is already collapsed to smaller matrices
> 3) --?
> It needs some math done first . I'll take a stab at 1 and 2 but thoughts and
> math are welcome.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira