[
https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14360956#comment-14360956
]
Debasish Das commented on SPARK-6323:
-------------------------------------
g(z) is not regularization...we support constraints like z>=0; lb <= z <=
ub;1'z = s, z >=0;L1(z) for now...These are the same constraints I supported
through QuadraticMinimizer for 2426. I already migrated ALS to use
QuadraticMinimizer (default) and NNLS(positive) and waiting for the next breeze
release.
I call it z since we are using splitting algorithms here for the solve
(projection based or admm + proximal)...
Sure for papers on global objective refer to any PLSA paper with matrix
factorization. I personally like these 2 and I am focused on them:
1. Tutorial on Probabilistic Topic Modeling: Additive Regularization for
Stochastic Matrix Factorization Equation (2) and (3)
2. The original PLSA paper from Hoffman et al.
For large rank matrix factorization I think the requirements come from sparse
topics now which can easily range in ~ 10K...
> Large rank matrix factorization with Nonlinear loss and constraints
> -------------------------------------------------------------------
>
> Key: SPARK-6323
> URL: https://issues.apache.org/jira/browse/SPARK-6323
> Project: Spark
> Issue Type: New Feature
> Components: ML, MLlib
> Affects Versions: 1.4.0
> Reporter: Debasish Das
> Fix For: 1.4.0
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> Currently ml.recommendation.ALS is optimized for gram matrix generation which
> scales to modest ranks. The problems that we can solve are in the normal
> equation/quadratic form: 0.5x'Hx + c'x + g(z)
> g(z) can be one of the constraints from Breeze proximal library:
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/Proximal.scala
> In this PR we will re-use ml.recommendation.ALS design and come up with
> ml.recommendation.ALM (Alternating Minimization). Thanks to [~mengxr] recent
> changes, it's straightforward to do it now !
> ALM will be capable of solving the following problems: min f ( x ) + g ( z )
> 1. Loss function f ( x ) can be LeastSquareLoss, LoglikelihoodLoss and
> HingeLoss. Most likely we will re-use the Gradient interfaces already defined
> and implement LoglikelihoodLoss
> 2. Constraints g ( z ) supported are same as above except that we don't
> support affine + bounds yet Aeq x = beq , lb <= x <= ub yet. Most likely we
> don't need that for ML applications
> 3. For solver we will use breeze.optimize.proximal.NonlinearMinimizer which
> in turn uses projection based solver (SPG) or proximal solvers (ADMM) based
> on convergence speed.
> https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/optimize/proximal/NonlinearMinimizer.scala
> 4. The factors will be SparseVector so that we keep shuffle size in check.
> For example we will run with 10K ranks but we will force factors to be
> 100-sparse.
> This is closely related to Sparse LDA
> https://issues.apache.org/jira/browse/SPARK-5564 with the difference that we
> are not using graph representation here.
> As we do scaling experiments, we will understand which flow is more suited as
> ratings get denser (my understanding is that since we already scaled ALS to 2
> billion ratings and we will keep sparsity in check, the same 2 billion flow
> will scale to 10K ranks as well)...
> This JIRA is intended to extend the capabilities of ml recommendation to
> generalized loss function.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]