[
https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163741#comment-13163741
]
Dmitriy Lyubimov commented on MAHOUT-817:
-----------------------------------------
Ok found a case what affects the Y fix. As soon as I take random gen off the 0
mean for the simulated orthonormal matrices for the test input, the difference
between version with Y fix and without it appears in the output.
The first printout is for PCA routine with Y fix, the second is for PCA routine
without Y fix, and the third one is SSVD over A-mean matrix.
re-attached the newest R files.
{code}
> ## PCActest
> # compute median xi
>
> xfixed=matrix(nrow=m,ncol=n)
> for ( i in 1:m) xfixed[i,]=x[i,]-xi
>
>
> respca=ssvd.cpca(x,k,qiter=qi)
fixing Y...
Warning message:
In sqrt(e$values) : NaNs produced
> # compare also with results when Y fix is ignored
> respca1=ssvd.cpca(x,k,qiter=qi,fixY=F)
Warning message:
In sqrt(e$values) : NaNs produced
>
> ressvd=ssvd.svd(xfixed,k,qiter=qi)
>
> # compare 3 sets of singular values
> respca$svalues
[1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140
[8] 2.0328541 1.1788628 0.8524032
> respca1$svalues
[1] 9.0504971 8.0487910 7.0238114 6.0246926 5.0250013 4.0221219 3.0371404
[8] 2.0306501 1.0668975 0.3805301
> ressvd$svalues
[1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140
[8] 2.0328541 1.1788628 0.8524032
>
> #compare first rows of singular vectors
> respca$v[1,]
[1] 0.010705297 0.002515335 -0.015630454 -0.023178851 -0.022406230
[6] -0.023602299 0.016234821 0.045020972 -0.084333758 -0.053624133
> respca1$v[1,]
[1] -0.010691547 0.002485415 -0.015705498 -0.023117058 0.022482137
[6] -0.023557896 0.015686873 0.046335615 -0.061378867 -0.226028214
> ressvd$v[1,]
[1] 0.010705297 0.002515335 -0.015630454 -0.023178851 -0.022406230
[6] -0.023602299 0.016234821 -0.045020972 0.084333758 -0.053624133
>
{code}
> Add PCA options to SSVD code
> ----------------------------
>
> Key: MAHOUT-817
> URL: https://issues.apache.org/jira/browse/MAHOUT-817
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.6
> Reporter: Dmitriy Lyubimov
> Assignee: Dmitriy Lyubimov
> Fix For: Backlog
>
> Attachments: SSVD-PCA options.pdf, ssvd-tests.R, ssvd.R, ssvd.m
>
>
> It seems that a simple solution should exist to integrate PCA mean
> subtraction into SSVD algorithm without making it a pre-requisite step and
> also avoiding densifying the big input.
> Several approaches were suggested:
> 1) subtract mean off B
> 2) propagate mean vector deeper into algorithm algebraically where the data
> is already collapsed to smaller matrices
> 3) --?
> It needs some math done first . I'll take a stab at 1 and 2 but thoughts and
> math are welcome.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira