Hm. yeah. i can do the version of distributed QR used in MR SSVD and subsequently defined by Nathan Halko in his dissertation. That version seemed to be incredibly numberically stable.
But i guess this is too much for a work not aligned with my current interest. Anyway, Cholesky-based SSVD should be enough (for now), i suppose. My PCA test exhibits a strange behavior where SSVD finds rank deficiency at 25-th value albeit i just generate the input with 100 singular vectors and spectrum 100:1. I may have an error in the input generation part, but even if i do, i would not expect it to be that bad. https://github.com/apache/mahout/blob/trunk/math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MathSuite.scala line 176, test ("spca") is in-core version of the test (distributed test generated 100% identical input with 100% identical results seen). On Mon, Mar 17, 2014 at 2:26 PM, Dmitriy Lyubimov (JIRA) <[email protected]>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] > > Dmitriy Lyubimov updated MAHOUT-1346: > ------------------------------------- > > Attachment: ScalaSparkBindings.pdf > > updating docs to reflect latest committed state. > Brought in distributed and in-core stochastic PCA scripts, colmeans, > colsums, drm-vector multiplication, more tests etc.etc. see the doc. > > > Spark Bindings (DRM) > > -------------------- > > > > Key: MAHOUT-1346 > > URL: https://issues.apache.org/jira/browse/MAHOUT-1346 > > Project: Mahout > > Issue Type: Improvement > > Affects Versions: 0.9 > > Reporter: Dmitriy Lyubimov > > Assignee: Dmitriy Lyubimov > > Fix For: 1.0 > > > > Attachments: ScalaSparkBindings.pdf > > > > > > Spark bindings for Mahout DRM. > > DRM DSL. > > Disclaimer. This will all be experimental at this point. > > The idea is to wrap DRM by Spark RDD with support of some basic > functionality, perhaps some humble beginning of Cost-based optimizer > > (0) Spark serialization support for Vector, Matrix > > (1) Bagel transposition > > (2) slim X'X > > (2a) not-so-slim X'X > > (3) blockify() (compose RDD containing vertical blocks of original input) > > (4) read/write Mahout DRM off HDFS > > (5) A'B > > ... > > > > -- > This message was sent by Atlassian JIRA > (v6.2#6252) >
