[ 
https://issues.apache.org/jira/browse/SPARK-18599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15699952#comment-15699952
 ] 

Apache Spark commented on SPARK-18599:
--------------------------------------

User 'jli05' has created a pull request for this issue:
https://github.com/apache/spark/pull/16023

> Add the Spectral LDA algorithm
> ------------------------------
>
>                 Key: SPARK-18599
>                 URL: https://issues.apache.org/jira/browse/SPARK-18599
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jencir Lee
>              Labels: lda
>
> The Spectral LDA algorithm transforms the LDA problem to an orthogonal tensor 
> decomposition problem. [[Anandkumar 2012]] establishes theoretical guarantee 
> for the convergence of orthogonal tensor decomposition. 
> This algorithm first builds 2nd-order, 3rd-order moments from the empirical 
> word counts, orthogonalize them and finally perform the tensor decomposition 
> on the empirical data moments. The whole procedure is purely linear and could 
> leverage machine native BLAS/LAPACK libraries (the Spark needs to be compiled 
> with `-Pnetlib-lgpl` option).
> It achieves competitive log-perplexity vs Online Variational Inference in the 
> shortest time. It also has clean memory usage -- as of v2.0.0 we've 
> experienced crash due to memory problem with the built-in Gibbs Sampler or 
> Online Variational Inference, but never with the Spectral LDA algorithm. This 
> algorithm is linearly scalable. 
> The original repo is at 
> https://github.com/FurongHuang/SpectralLDA-TensorSpark. We refactored for the 
> Spark coding style and interfaces when porting over for the PR. We wrote a 
> report describing the algorithm in detail and listing test results at 
> https://www.overleaf.com/read/wscdvwrjmtmw. It's going to enter our official 
> repo soon.
> REFERENCES
> Anandkumar, Anima, et al., Tensor decompositions for learning latent variable 
> models, 2012, https://arxiv.org/abs/1210.7559.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to