[
https://issues.apache.org/jira/browse/SPARK-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699999#comment-15699999
]
Jencir Lee commented on SPARK-18599:
------------------------------------
We consider it a competitive alternative to the built-in Collapsed Gibbs
sampler, Online Variational Inference. The latter two follow an iterative
procedure, which usually take long to run and even longer to check the
convergence, such that in practice it's largely down to users' experiments when
using them. In contrary, the orthogonal tensor decomposition usually converges
within seconds (it will spend some time building the tensor, but the overall
runtime is much shorter). That's why we think it may be helpful for users of
the LDA model in Spark.
> Add the Spectral LDA algorithm
> ------------------------------
>
> Key: SPARK-18599
> URL: https://issues.apache.org/jira/browse/SPARK-18599
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Reporter: Jencir Lee
> Labels: lda
>
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor
> decomposition problem. [[Anandkumar 2012]] establishes theoretical guarantee
> for the convergence of orthogonal tensor decomposition.
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical
> word counts, orthogonalize them and finally perform the tensor decomposition
> on the empirical data moments. The whole procedure is purely linear and could
> leverage machine native BLAS/LAPACK libraries (the Spark needs to be compiled
> with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the
> shortest time. It also has clean memory usage -- as of v2.0.0 we've
> experienced crash due to memory problem with the built-in Gibbs Sampler or
> Online Variational Inference, but never with the Spectral LDA algorithm. This
> algorithm is linearly scalable.
> The original repo is at
> https://github.com/FurongHuang/SpectralLDA-TensorSpark. We refactored for the
> Spark coding style and interfaces when porting over for the PR. We wrote a
> report describing the algorithm in detail and listing test results at
> https://www.overleaf.com/read/wscdvwrjmtmw. It's going to enter our official
> repo soon.
> REFERENCES
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable
> models, 2012, https://arxiv.org/abs/1210.7559.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]