[GitHub] incubator-hivemall issue #76: [HIVEMALL-74-2][HIVEMALL-91-2] Revise topic mo...

takuti Mon, 08 May 2017 22:37:42 -0700

Github user takuti commented on the issue:

    https://github.com/apache/incubator-hivemall/pull/76
  
    Tried different hyper-parameters and now obtained much better results:
    
    ```sql
    select train_plsa(features, "-topics 20 -iter 10 -s 128 -delta 0.01 -alpha 
512 -eps 0.1") as (label, word, prob)
    ```
    
    ```
    label   word    prob
    0       time    0.05997549742460251
    0       way     0.05171169713139534
    0       use     0.035251982510089874
    0       value   0.02848050743341446
    0       used    0.025441156700253487
    1       chip    0.09532734751701355
    1       clipper 0.05711857229471207
    1       algorithm       0.051710840314626694
    1       key     0.05102092772722244
    1       use     0.04907272756099701
    2       good    0.1450035274028778
    2       like    0.05599432438611984
    2       make    0.04267672821879387
    2       just    0.03836267814040184
    2       thing   0.031985796988010406
    3       article 0.048415664583444595
    3       based   0.03658626228570938
    3       group   0.030994825065135956
    3       read    0.026976328343153
    3       example 0.021381204947829247
    4       just    0.07593566179275513
    4       ve      0.07072831690311432
    4       good    0.05930205062031746
    4       going   0.03802431374788284
    4       think   0.037296053022146225
    5       people  0.04686562716960907
    5       right   0.04199033975601196
    5       israel  0.031667809933423996
    5       public  0.0292668379843235
    5       white   0.029161151498556137
    6       edu     0.10378731042146683
    6       net     0.05129273235797882
    6       mail    0.049513332545757294
    6       com     0.04815826937556267
    6       phone   0.04432467743754387
    7       know    0.08519471436738968
    7       like    0.07772345095872879
    7       think   0.060362670570611954
    7       don     0.0590314120054245
    7       does    0.04994097724556923
    8       excellent       0.08700280636548996
    8       new     0.06730616837739944
    8       included        0.04151006042957306
    8       cover   0.03946089372038841
    8       quality 0.02619233727455139
    9       time    0.04681145027279854
    9       did     0.02621248923242092
    9       little  0.0245593823492527
    9       new     0.02417983114719391
    9       far     0.02375759556889534
    10      year    0.05394640192389488
    10      game    0.03764347359538078
    10      good    0.03315029293298721
    10      average 0.032980453222990036
    10      second  0.03197808936238289
    11      75      0.0755339190363884
    11      14      0.05241614207625389
    11      mr      0.046052053570747375
    11      ll      0.04552857577800751
    11      25      0.04117002338171005
    12      day     0.0393703319132328
    12      people  0.03855934739112854
    12      said    0.024915607646107674
    12      world   0.023490646854043007
    12      turkey  0.02076609991490841
    13      want    0.04342740774154663
    13      don     0.036026883870363235
    13      right   0.0349094532430172
    13      know    0.03472379595041275
    13      believe 0.03130199387669563
    14      50      0.17409634590148926
    14      00      0.11029791086912155
    14      10      0.04991569370031357
    14      30      0.040771979838609695
    14      15      0.0406755730509758
    15      problem 0.05924408510327339
    15      thanks  0.05560215935111046
    15      does    0.04438205063343048
    15      using   0.03396100923418999
    15      use     0.031639792025089264
    16      scsi    0.1266227811574936
    16      video   0.03675418719649315
    16      card    0.03637675568461418
    16      port    0.033804234117269516
    16      driver  0.030237805098295212
    17      program 0.04696282744407654
    17      software        0.03828095644712448
    17      write   0.03822388872504234
    17      version 0.03776012361049652
    17      file    0.03350536897778511
    18      law     0.03776385635137558
    18      court   0.031376246362924576
    18      question        0.027072720229625702
    18      control 0.02526409737765789
    18      rate    0.021629702299833298
    19      god     0.09157539159059525
    19      believe 0.03708299621939659
    19      church  0.03347333148121834
    19      bible   0.032943107187747955
    19      true    0.03110317699611187
    ```
    
    Setting larger values to `-alpha` is effective to avoid overfitting. I 
mentioned this point in the document.
    
    Currently, in case that a user set inappropriate hyper-parameter and 
training falls to a bad result (i.e., Infinity perplexity), the UDF simply 
throws an exception at 
[HERE](https://github.com/takuti/incubator-hivemall/blob/6139ded111e8b7e872766fc96a4cc58d77aa578b/core/src/main/java/hivemall/topicmodel/IncrementalPLSAModel.java#L266-L267).
 Is it okay @myui ? If just logging a warning message is preferable, I will 
update so.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-hivemall issue #76: [HIVEMALL-74-2][HIVEMALL-91-2] Revise topic mo...

Reply via email to