[
https://issues.apache.org/jira/browse/SPARK-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699975#comment-15699975
]
Sean Owen commented on SPARK-18599:
-----------------------------------
If this is already a usable stand-alone package, does it need to be in Spark?
the general idea is for things like this to not be pushed into the project
itself.
> Add the Spectral LDA algorithm
> ------------------------------
>
> Key: SPARK-18599
> URL: https://issues.apache.org/jira/browse/SPARK-18599
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Jencir Lee
> Labels: lda
>
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor
> decomposition problem. [[Anandkumar 2012]] establishes theoretical guarantee
> for the convergence of orthogonal tensor decomposition.
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical
> word counts, orthogonalize them and finally perform the tensor decomposition
> on the empirical data moments. The whole procedure is purely linear and could
> leverage machine native BLAS/LAPACK libraries (the Spark needs to be compiled
> with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the
> shortest time. It also has clean memory usage -- as of v2.0.0 we've
> experienced crash due to memory problem with the built-in Gibbs Sampler or
> Online Variational Inference, but never with the Spectral LDA algorithm. This
> algorithm is linearly scalable.
> The original repo is at
> https://github.com/FurongHuang/SpectralLDA-TensorSpark. We refactored for the
> Spark coding style and interfaces when porting over for the PR. We wrote a
> report describing the algorithm in detail and listing test results at
> https://www.overleaf.com/read/wscdvwrjmtmw. It's going to enter our official
> repo soon.
> REFERENCES
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable
> models, 2012, https://arxiv.org/abs/1210.7559.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]