Hm. yeah. i can do the version of distributed QR used in MR SSVD and
subsequently defined by Nathan Halko in his dissertation. That version
seemed to be incredibly numberically stable.

But i guess this is too much for a work not aligned with my current
interest.

Anyway, Cholesky-based SSVD should be enough (for now), i suppose. My PCA
test exhibits a strange behavior where SSVD finds rank deficiency at 25-th
value albeit i just generate the input with 100 singular vectors and
spectrum 100:1. I may have an error in the input generation part, but even
if i do, i would not expect it to be that bad.

https://github.com/apache/mahout/blob/trunk/math-scala/src/test/scala/org/apache/mahout/math/scalabindings/MathSuite.scala
line
176, test ("spca") is in-core version of the test (distributed test
generated 100% identical input with 100% identical results seen).


On Mon, Mar 17, 2014 at 2:26 PM, Dmitriy Lyubimov (JIRA) <[email protected]>wrote:

>
>      [
> https://issues.apache.org/jira/browse/MAHOUT-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Dmitriy Lyubimov updated MAHOUT-1346:
> -------------------------------------
>
>     Attachment: ScalaSparkBindings.pdf
>
> updating docs to reflect latest committed state.
> Brought in distributed and in-core stochastic PCA scripts, colmeans,
> colsums, drm-vector multiplication, more tests etc.etc. see the doc.
>
> > Spark Bindings (DRM)
> > --------------------
> >
> >                 Key: MAHOUT-1346
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1346
> >             Project: Mahout
> >          Issue Type: Improvement
> >    Affects Versions: 0.9
> >            Reporter: Dmitriy Lyubimov
> >            Assignee: Dmitriy Lyubimov
> >             Fix For: 1.0
> >
> >         Attachments: ScalaSparkBindings.pdf
> >
> >
> > Spark bindings for Mahout DRM.
> > DRM DSL.
> > Disclaimer. This will all be experimental at this point.
> > The idea is to wrap DRM by Spark RDD with support of some basic
> functionality, perhaps some humble beginning of Cost-based optimizer
> > (0) Spark serialization support for Vector, Matrix
> > (1) Bagel transposition
> > (2) slim X'X
> > (2a) not-so-slim X'X
> > (3) blockify() (compose RDD containing vertical blocks of original input)
> > (4) read/write Mahout DRM off HDFS
> > (5) A'B
> > ...
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>

Reply via email to