[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model

helenahm Tue, 04 Jul 2017 21:43:41 -0700

Github user helenahm commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/93
  
    I will do more tests too, as I actually need the model for a project. So I 
plan to test it under "load" too. I will write about the results.
    
    It may have similar issues that Random Forest has. You are right. In a 
nutshell the implementation and memory concerns are similar. 
    
    The implementation is as scalable as the implementation of Random Forest: 
one or more models per mapper and then a UDAF that combines all the learned 
models into one final model.
    
    I still use the Random Forest even though on EMR r4 machines _numTrees 
greater than 1 does not work for me for my dataset. MaxEnt though will give me 
a better model, I think, I will not have to think whether there is overfitting 
because of the tree structure, etc.
    
    Iterative Scaling can be re-written from scratch too without using any 
third-party software. This is an option too.
    
    I am sure that NLP community will more likely accept the implementation and 
will use it in exactly the way those guys have written it. We very much value 
Adwait Ratnaparkhi's work. Many published articles use exactly that Max Ent 
implementation. That means that people will be able to use HiveMall and compare 
their newer results with results of their previous work.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #93: [WIP][HIVEMALL-126] Maximum Entropy Model

Reply via email to