[
https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342311#comment-14342311
]
Debasish Das edited comment on SPARK-5564 at 3/1/15 4:19 PM:
-------------------------------------------------------------
I am right now using the following PR to do large rank matrix factorization
with various constraints...I am not sure if the current ALS will scale to large
ranks but I am keen to compare the exact formulation in graphx based LDA flow...
https://github.com/scalanlp/breeze/pull/364
Idea here is to solve the constrained factorization problem as explained in
Vorontsov and Potapenko:
minimize f(w,h*)
s.t 1'w = 1, w >=0 (row constraints)
minimize f(w*,h)
s.t 0 <= h <= 1, Normalize each column in h
Here I want f(w,h) to be MAP loss but I already solved the least square variant
in https://issues.apache.org/jira/browse/SPARK-2426 and got good improvement in
MAP statistics...Here also I expect Perplexity will improve...
If no one else is looking into it I would like to compare join based
factorization based flow (ml.recommendation.ALS) with the graphx based LDA
flow...
Infact if you think for large ranks, LDA based flow will be more efficient than
join based factorization flow, I can implement stochastic matrix factorization
directly on top of LDA and add both the least square and MAP losses...
was (Author: debasish83):
I am right now using the following PR to do large rank matrix factorization
with various constraints...I am not sure if the current ALS will scale to large
ranks but I will keen to compare the exact formulation in graphx based LDA
flow...
https://github.com/scalanlp/breeze/pull/364
Idea here is to solve the constrained factorization problem as explained in
Vorontsov and Potapenko:
minimize f(w,h*)
s.t 1'w = 1, w >=0 (row constraints)
minimize f(w*,h)
s.t 0 <= h <= 1, Normalize each column in h
Here I want f(w,h) to be MAP loss but I already solved the least square variant
in https://issues.apache.org/jira/browse/SPARK-2426 and got good improvement in
MAP statistics...Here also I expect Perplexity will improve...
If no one else is looking into it I would like to compare join based
factorization based flow (ml.recommendation.ALS) with the graphx based LDA
flow...
Infact if you think for large ranks, LDA based flow will be more efficient than
join based factorization flow, I can implement stochastic matrix factorization
directly on top of LDA and add both the least square and MAP losses...
> Support sparse LDA solutions
> ----------------------------
>
> Key: SPARK-5564
> URL: https://issues.apache.org/jira/browse/SPARK-5564
> Project: Spark
> Issue Type: Improvement
> Components: MLlib
> Affects Versions: 1.3.0
> Reporter: Joseph K. Bradley
>
> Latent Dirichlet Allocation (LDA) currently requires that the priors’
> concentration parameters be > 1.0. It should support values > 0.0, which
> should encourage sparser topics (phi) and document-topic distributions
> (theta).
> For EM, this will require adding a projection to the M-step, as in: Vorontsov
> and Potapenko. "Tutorial on Probabilistic Topic Modeling : Additive
> Regularization for Stochastic Matrix Factorization." 2014.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]