[
https://issues.apache.org/jira/browse/SPARK-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702885#comment-15702885
]
Joseph K. Bradley commented on SPARK-18599:
-------------------------------------------
It would be great to test this as a Spark Package first; that will let us
collect feedback from users to get a better idea of whether it should be put in
MLlib itself. Feel free to link the package from this JIRA, and to use this
JIRA for users to post results.
(Also, please let committers set the "Target Version" field.)
Thanks!
> Add the Spectral LDA algorithm
> ------------------------------
>
> Key: SPARK-18599
> URL: https://issues.apache.org/jira/browse/SPARK-18599
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Reporter: Jencir Lee
> Labels: lda
>
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor
> decomposition problem. [[Anandkumar 2012]] establishes theoretical guarantee
> for the convergence of orthogonal tensor decomposition.
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical
> word counts, orthogonalize them and finally perform the tensor decomposition
> on the empirical data moments. The whole procedure is purely linear and could
> leverage machine native BLAS/LAPACK libraries (the Spark needs to be compiled
> with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the
> shortest time. It also has clean memory usage -- as of v2.0.0 we've
> experienced crash due to memory problem with the built-in Gibbs Sampler or
> Online Variational Inference, but never with the Spectral LDA algorithm. This
> algorithm is linearly scalable.
> The original repo is at
> https://github.com/FurongHuang/SpectralLDA-TensorSpark. We refactored for the
> Spark coding style and interfaces when porting over for the PR. We wrote a
> report describing the algorithm in detail and listing test results at
> https://www.overleaf.com/read/wscdvwrjmtmw. It's going to enter our official
> repo soon.
> REFERENCES
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable
> models, 2012, https://arxiv.org/abs/1210.7559.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]