Github user takuti commented on the issue:
https://github.com/apache/incubator-hivemall/pull/76
Tried different hyper-parameters and now obtained much better results:
```sql
select train_plsa(features, "-topics 20 -iter 10 -s 128 -delta 0.01 -alpha
512 -eps 0.1") as (label, word, prob)
```
```
label word prob
0 time 0.05997549742460251
0 way 0.05171169713139534
0 use 0.035251982510089874
0 value 0.02848050743341446
0 used 0.025441156700253487
1 chip 0.09532734751701355
1 clipper 0.05711857229471207
1 algorithm 0.051710840314626694
1 key 0.05102092772722244
1 use 0.04907272756099701
2 good 0.1450035274028778
2 like 0.05599432438611984
2 make 0.04267672821879387
2 just 0.03836267814040184
2 thing 0.031985796988010406
3 article 0.048415664583444595
3 based 0.03658626228570938
3 group 0.030994825065135956
3 read 0.026976328343153
3 example 0.021381204947829247
4 just 0.07593566179275513
4 ve 0.07072831690311432
4 good 0.05930205062031746
4 going 0.03802431374788284
4 think 0.037296053022146225
5 people 0.04686562716960907
5 right 0.04199033975601196
5 israel 0.031667809933423996
5 public 0.0292668379843235
5 white 0.029161151498556137
6 edu 0.10378731042146683
6 net 0.05129273235797882
6 mail 0.049513332545757294
6 com 0.04815826937556267
6 phone 0.04432467743754387
7 know 0.08519471436738968
7 like 0.07772345095872879
7 think 0.060362670570611954
7 don 0.0590314120054245
7 does 0.04994097724556923
8 excellent 0.08700280636548996
8 new 0.06730616837739944
8 included 0.04151006042957306
8 cover 0.03946089372038841
8 quality 0.02619233727455139
9 time 0.04681145027279854
9 did 0.02621248923242092
9 little 0.0245593823492527
9 new 0.02417983114719391
9 far 0.02375759556889534
10 year 0.05394640192389488
10 game 0.03764347359538078
10 good 0.03315029293298721
10 average 0.032980453222990036
10 second 0.03197808936238289
11 75 0.0755339190363884
11 14 0.05241614207625389
11 mr 0.046052053570747375
11 ll 0.04552857577800751
11 25 0.04117002338171005
12 day 0.0393703319132328
12 people 0.03855934739112854
12 said 0.024915607646107674
12 world 0.023490646854043007
12 turkey 0.02076609991490841
13 want 0.04342740774154663
13 don 0.036026883870363235
13 right 0.0349094532430172
13 know 0.03472379595041275
13 believe 0.03130199387669563
14 50 0.17409634590148926
14 00 0.11029791086912155
14 10 0.04991569370031357
14 30 0.040771979838609695
14 15 0.0406755730509758
15 problem 0.05924408510327339
15 thanks 0.05560215935111046
15 does 0.04438205063343048
15 using 0.03396100923418999
15 use 0.031639792025089264
16 scsi 0.1266227811574936
16 video 0.03675418719649315
16 card 0.03637675568461418
16 port 0.033804234117269516
16 driver 0.030237805098295212
17 program 0.04696282744407654
17 software 0.03828095644712448
17 write 0.03822388872504234
17 version 0.03776012361049652
17 file 0.03350536897778511
18 law 0.03776385635137558
18 court 0.031376246362924576
18 question 0.027072720229625702
18 control 0.02526409737765789
18 rate 0.021629702299833298
19 god 0.09157539159059525
19 believe 0.03708299621939659
19 church 0.03347333148121834
19 bible 0.032943107187747955
19 true 0.03110317699611187
```
Setting larger values to `-alpha` is effective to avoid overfitting. I
mentioned this point in the document.
Currently, in case that a user set inappropriate hyper-parameter and
training falls to a bad result (i.e., Infinity perplexity), the UDF simply
throws an exception at
[HERE](https://github.com/takuti/incubator-hivemall/blob/6139ded111e8b7e872766fc96a4cc58d77aa578b/core/src/main/java/hivemall/topicmodel/IncrementalPLSAModel.java#L266-L267).
Is it okay @myui ? If just logging a warning message is preferable, I will
update so.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---