[ 
https://issues.apache.org/jira/browse/MAHOUT-817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13163741#comment-13163741
 ] 

Dmitriy Lyubimov commented on MAHOUT-817:
-----------------------------------------

Ok found a case what affects the Y fix. As soon as I take random gen off the 0 
mean for the simulated orthonormal matrices for the test input, the difference 
between version with Y fix  and without it appears in the output.

The first printout is for PCA routine with Y fix, the second is for PCA routine 
without Y fix, and the third one is SSVD over A-mean matrix.

re-attached the newest R files.

{code}
> ## PCActest
> # compute median xi
> 
> xfixed=matrix(nrow=m,ncol=n)
> for ( i in 1:m) xfixed[i,]=x[i,]-xi
> 
> 
> respca=ssvd.cpca(x,k,qiter=qi)
fixing Y...
Warning message:
In sqrt(e$values) : NaNs produced
> # compare also with results when Y fix is ignored
> respca1=ssvd.cpca(x,k,qiter=qi,fixY=F)
Warning message:
In sqrt(e$values) : NaNs produced
> 
> ressvd=ssvd.svd(xfixed,k,qiter=qi)
> 
> # compare 3 sets of singular values
> respca$svalues
 [1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140
 [8] 2.0328541 1.1788628 0.8524032
> respca1$svalues
 [1] 9.0504971 8.0487910 7.0238114 6.0246926 5.0250013 4.0221219 3.0371404
 [8] 2.0306501 1.0668975 0.3805301
> ressvd$svalues
 [1] 9.0584987 8.0500343 7.0271257 6.0267613 5.0266239 4.0221945 3.0428140
 [8] 2.0328541 1.1788628 0.8524032
> 
> #compare first rows of singular vectors
> respca$v[1,]
 [1]  0.010705297  0.002515335 -0.015630454 -0.023178851 -0.022406230
 [6] -0.023602299  0.016234821  0.045020972 -0.084333758 -0.053624133
> respca1$v[1,]
 [1] -0.010691547  0.002485415 -0.015705498 -0.023117058  0.022482137
 [6] -0.023557896  0.015686873  0.046335615 -0.061378867 -0.226028214
> ressvd$v[1,]
 [1]  0.010705297  0.002515335 -0.015630454 -0.023178851 -0.022406230
 [6] -0.023602299  0.016234821 -0.045020972  0.084333758 -0.053624133
> 
{code}
                
> Add PCA options to SSVD code
> ----------------------------
>
>                 Key: MAHOUT-817
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-817
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.6
>            Reporter: Dmitriy Lyubimov
>            Assignee: Dmitriy Lyubimov
>             Fix For: Backlog
>
>         Attachments: SSVD-PCA options.pdf, ssvd-tests.R, ssvd.R, ssvd.m
>
>
> It seems that a simple solution should exist to integrate PCA mean 
> subtraction into SSVD algorithm without making it a pre-requisite step and 
> also avoiding densifying the big input. 
> Several approaches were suggested:
> 1) subtract mean off B
> 2) propagate mean vector deeper into algorithm algebraically where the data 
> is already collapsed to smaller matrices
> 3) --?
> It needs some math done first . I'll take a stab at 1 and 2 but thoughts and 
> math are welcome.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to