[GitHub] incubator-hivemall issue #71: [WIP][HIVEMALL-74] Implement pLSA

takuti Mon, 24 Apr 2017 03:48:51 -0700

Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/71
  
    I've realized that the main difference between following two papers is in 
**how to initialize P(w|z) for newly observed words**. 
    
    - [Incremental Probabilistic Latent Semantic Analysis for
    Automatic Question 
Recommendation](https://pdfs.semanticscholar.org/b66e/c7faf2e4888503f7ad1537d284f350fb3e58.pdf)
 
    - [Using Incremental PLSI for Threshold-Resilient Online Event 
Analysis](https://pdfs.semanticscholar.org/a258/b33e285da2e93b59e50311d50ff46045a38b.pdf)
    
    The former (i.e., current implementation) simply initializes w/ random 
values. Previous P(w|z) could be incorporated by setting a hyper-parameter 
`alpha` if we wanted (, and `alpha=0` is also possible). 
    
    Meanwhile, the latter requires to undergo certain fold-in procedure to 
compute "better" P(w|z) by setting a window size. IMO, this approach is too 
much complex to achieve our goal (=implement pLSA UDTF which repeats EM 
iterations over the same set of mini-batches).
    
    Thus, I will finalize this PR with current implementation.
    
    Todo:
    
    - [ ] Double-check if the algorithm is implemented correctly
    - [ ] Documentation
      - Difference with LDA
      - Explain the effect of `alpha`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #71: [WIP][HIVEMALL-74] Implement pLSA

Reply via email to