[ https://issues.apache.org/jira/browse/SPARK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342311#comment-14342311 ]
Debasish Das edited comment on SPARK-5564 at 3/1/15 4:19 PM: ------------------------------------------------------------- I am right now using the following PR to do large rank matrix factorization with various constraints...I am not sure if the current ALS will scale to large ranks but I am keen to compare the exact formulation in graphx based LDA flow... https://github.com/scalanlp/breeze/pull/364 Idea here is to solve the constrained factorization problem as explained in Vorontsov and Potapenko: minimize f(w,h*) s.t 1'w = 1, w >=0 (row constraints) minimize f(w*,h) s.t 0 <= h <= 1, Normalize each column in h Here I want f(w,h) to be MAP loss but I already solved the least square variant in https://issues.apache.org/jira/browse/SPARK-2426 and got good improvement in MAP statistics...Here also I expect Perplexity will improve... If no one else is looking into it I would like to compare join based factorization based flow (ml.recommendation.ALS) with the graphx based LDA flow... Infact if you think for large ranks, LDA based flow will be more efficient than join based factorization flow, I can implement stochastic matrix factorization directly on top of LDA and add both the least square and MAP losses... was (Author: debasish83): I am right now using the following PR to do large rank matrix factorization with various constraints...I am not sure if the current ALS will scale to large ranks but I will keen to compare the exact formulation in graphx based LDA flow... https://github.com/scalanlp/breeze/pull/364 Idea here is to solve the constrained factorization problem as explained in Vorontsov and Potapenko: minimize f(w,h*) s.t 1'w = 1, w >=0 (row constraints) minimize f(w*,h) s.t 0 <= h <= 1, Normalize each column in h Here I want f(w,h) to be MAP loss but I already solved the least square variant in https://issues.apache.org/jira/browse/SPARK-2426 and got good improvement in MAP statistics...Here also I expect Perplexity will improve... If no one else is looking into it I would like to compare join based factorization based flow (ml.recommendation.ALS) with the graphx based LDA flow... Infact if you think for large ranks, LDA based flow will be more efficient than join based factorization flow, I can implement stochastic matrix factorization directly on top of LDA and add both the least square and MAP losses... > Support sparse LDA solutions > ---------------------------- > > Key: SPARK-5564 > URL: https://issues.apache.org/jira/browse/SPARK-5564 > Project: Spark > Issue Type: Improvement > Components: MLlib > Affects Versions: 1.3.0 > Reporter: Joseph K. Bradley > > Latent Dirichlet Allocation (LDA) currently requires that the priors’ > concentration parameters be > 1.0. It should support values > 0.0, which > should encourage sparser topics (phi) and document-topic distributions > (theta). > For EM, this will require adding a projection to the M-step, as in: Vorontsov > and Potapenko. "Tutorial on Probabilistic Topic Modeling : Additive > Regularization for Stochastic Matrix Factorization." 2014. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org