[
https://issues.apache.org/jira/browse/MADLIB-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909396#comment-16909396
]
Himanshu Pandey edited comment on MADLIB-1351 at 8/19/19 4:04 PM:
------------------------------------------------------------------
[~fmcquillan]
{code:java}
If 'iter_num=5' and 'evaluate_every=1', then 'perplexity_iters' value would be
{1,2,3,4,5}{code}
However,we are also updating the model table one final time after all
iterations are completed. So, Perplexity will have 6 values something like
this:
{code:java}
{74.9531135523,70.7078733742,69.531331269,68.3480936661,72.3446381087,68.940249051}{code}
What we will update in the perplexity_iters for the "final update" of the model
table?
was (Author: [email protected]):
[~fmcquillan]
{code}If 'iter_num=5' and 'evaluate_every=1', then 'perplexity_iters' value
would be {1,2,3,4,5}{code}
However,we are also updating the model table one final time after all
iterations are completed. So, Perplexity will have 6 values something like
this:
{code:java}
{73.7550415613786,70.5237666023843,70.6146354978257,71.6661000896055,69.7403205794835,
72.8881000896057}{code}
What we will update in the perplexity_iters for the "final update" of the model
table?
> Add stopping criteria on perplexity to LDA
> ------------------------------------------
>
> Key: MADLIB-1351
> URL: https://issues.apache.org/jira/browse/MADLIB-1351
> Project: Apache MADlib
> Issue Type: Improvement
> Components: Module: Parallel Latent Dirichlet Allocation
> Reporter: Frank McQuillan
> Assignee: Himanshu Pandey
> Priority: Major
> Fix For: v1.17
>
>
> In LDA
> http://madlib.apache.org/docs/latest/group__grp__lda.html
> make stopping criteria on perplexity rather than just number of iterations.
> Suggested approach is to do what scikit-learn does
> https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
> evaluate_every : int, optional (default=0)
> How often to evaluate perplexity. Set it to 0 or negative number to not
> evaluate perplexity in training at all. Evaluating perplexity can help you
> check convergence in training process, but it will also increase total
> training time. Evaluating perplexity in every iteration might increase
> training time up to two-fold.
> perplexity_tol : float, optional (default=1e-1)
> Perplexity tolerance to stop iterating. Only used when evaluate_every is
> greater than 0.
--
This message was sent by Atlassian Jira
(v8.3.2#803003)