[
https://issues.apache.org/jira/browse/MAHOUT-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13129983#comment-13129983
]
Hudson commented on MAHOUT-766:
-------------------------------
Integrated in Mahout-Quality #1101 (See
[https://builds.apache.org/job/Mahout-Quality/1101/])
MAHOUT-766: Changed m argument to 1.1 and switched Dirichlet to use
clustering vs. classifier implementation. Added cosine distance measure to
reuters kmeans.
jeastman :
http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1185737
Files :
* /mahout/trunk/examples/bin/build-reuters.sh
*
/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayDirichlet.java
*
/mahout/trunk/examples/src/main/java/org/apache/mahout/clustering/display/DisplayFuzzyKMeans.java
> fuzzy kmeans - all cluster with the same top terms
> ---------------------------------------------------
>
> Key: MAHOUT-766
> URL: https://issues.apache.org/jira/browse/MAHOUT-766
> Project: Mahout
> Issue Type: Bug
> Components: Clustering, Examples
> Affects Versions: 0.6
> Environment: tested in OSX and linux
> Reporter: Paulo Magalhaes
> Assignee: Jeff Eastman
> Fix For: 0.6
>
>
> believe there is something wrong with fkmeans in trunk.
> I am using code from trunk (last checkout 6/30/11). To recreate is very
> simple:
> 1) change examples/bin/build-reuters.sh to use fkmeans and set -m 2
> 2) run build-reuters.sh
> 3) Dump the cluster. I'm doing: ../../bin/mahout clusterdump -dt sequencefile
> -s ./mahout-work/reuters-kmeans/clusters-6 -b 100 -o
> ./reuters-clusterdump.txt -d
> ./mahout-work/reuters-out-seqdir-sparse-kmeans/dictionary.file-0
> here is what the clusters look like:
> SV-15898{n=34 c=[0:0.020, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.7254762602900604
> mln => 1.2510936664951733
> dlrs => 1.1340145215097008
> 3 => 1.0643797240793276
> pct => 1.0422760712239152
> reuter => 1.0202689935247569
> its => 0.9997771992646881
> from => 0.9903731234557381
> year => 0.8855389859684145
> vs => 0.8291746545786391
> :SV-14766{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6406710289350412
> mln => 1.2174993414858022
> dlrs => 1.0937941570322955
> 3 => 1.0334420773050856
> pct => 0.991539915235039
> reuter => 0.990042452019326
> its => 0.9508638527143669
> from => 0.9403885495991262
> vs => 0.865437130369746
> year => 0.8463503194752994
> :SV-14854{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.641260962665307
> mln => 1.217806578134094
> dlrs => 1.0941157210136143
> 3 => 1.0336934328877394
> pct => 0.991895013999163
> reuter => 0.9902889592990656
> its => 0.9512076670014483
> from => 0.9407384847445094
> vs => 0.8653426311034671
> year => 0.8466407590692175
> :SV-14890{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6410352907185948
> mln => 1.21769021136256
> dlrs => 1.0939933408434481
> 3 => 1.0335977297579235
> pct => 0.991759193577722
> reuter => 0.9901951250301172
> its => 0.9510761761632947
> from => 0.9406047832581563
> vs => 0.8653814488835572
> year => 0.8465301083353372
> :SV-14972{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.640981249652196
> mln => 1.2176595452829564
> dlrs => 1.093962519439548
> 3 => 1.0335737897463568
> pct => 0.9917266257955816
> reuter => 0.9901715950801396
> its => 0.9510446208123859
> from => 0.9405723357372776
> vs => 0.8653843699725567
> year => 0.846502466267153
> :SV-15023{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6399319888551425
> mln => 1.217099157115808
> dlrs => 1.0933830369192543
> 3 => 1.033121271434882
> pct => 0.991094828319561
> reuter => 0.9897275313905611
> its => 0.9504327303592046
> from => 0.9399480272494183
> vs => 0.8655203514280634
> year => 0.8459804922897428
> :SV-15330{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6411480082558068
> mln => 1.217746071140758
> dlrs => 1.0940532425506244
> 3 => 1.0336447143638317
> pct => 0.9918269975797083
> reuter => 0.990241145450359
> its => 0.9511417993006985
> from => 0.9406712099799636
> vs => 0.8653569180999117
> year => 0.8465844425179013
> :SV-15403{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6493270418577013
> mln => 1.221708475489808
> dlrs => 1.0983489300320377
> 3 => 1.0370024996153944
> pct => 0.9967446058994232
> reuter => 0.993528974793619
> its => 0.9558988111209523
> from => 0.9454911460774864
> vs => 0.8633642497287671
> year => 0.8505083085439775
> :SV-15514{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6414524586689534
> mln => 1.2179029815366167
> dlrs => 1.094218299808865
> 3 => 1.033773769117182
> pct => 0.9920102286561391
> reuter => 0.9903676795676004
> its => 0.9513191861395162
> from => 0.9408515920762511
> vs => 0.865304353452142
> year => 0.8467337135094862
> :SV-15549{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.640632892454694
> mln => 1.2174764812983898
> dlrs => 1.0937717467869699
> 3 => 1.033424727632325
> pct => 0.99151691360307
> reuter => 0.9900253758026865
> its => 0.9508415534060888
> from => 0.9403654699584985
> vs => 0.865436402399392
> year => 0.8463303217162843
> :SV-15616{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6402745961421197
> mln => 1.217287104215781
> dlrs => 1.0935749393200054
> 3 => 1.0332709291683844
> pct => 0.9913012005612369
> reuter => 0.9898744911012118
> its => 0.9506326562835085
> from => 0.9401525895225771
> vs => 0.8654873596392523
> year => 0.8461528918952358
> :SV-15674{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6402335213893247
> mln => 1.2172651791725515
> dlrs => 1.0935522610806727
> 3 => 1.0332532137000938
> pct => 0.991276468108388
> reuter => 0.9898571070574692
> its => 0.9506087026962596
> from => 0.9401281555632803
> vs => 0.8654927058873914
> year => 0.8461324681573653
> :SV-15720{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.641454220566282
> mln => 1.2179063418879368
> dlrs => 1.0942205822099829
> 3 => 1.0337754035575257
> pct => 0.9920113271819195
> reuter => 0.9903693325123661
> its => 0.9513202705619623
> from => 0.9408530174807668
> vs => 0.8653096216062077
> year => 0.8467355860669477
> :SV-15732{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6418679366988789
> mln => 1.218118262616823
> dlrs => 1.0944441677361394
> 3 => 1.0339502052648608
> pct => 0.9922602967957669
> reuter => 0.9905406967751569
> its => 0.9515612774046113
> from => 0.941098001639954
> vs => 0.865235154416334
> year => 0.8469379811534101
> :SV-15825{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6403540331112847
> mln => 1.2173302824011656
> dlrs => 1.0936192179118565
> 3 => 1.0333054698476525
> pct => 0.9913490440255205
> reuter => 0.9899084014354236
> its => 0.9506790000021428
> from => 0.9401999656754023
> vs => 0.8654787849286104
> year => 0.8461927112339609
> :SV-15888{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.641852069569193
> mln => 1.218106579705691
> dlrs => 1.0944336674208315
> 3 => 1.0339422184421034
> pct => 0.9922506923700831
> reuter => 0.9905327937543529
> its => 0.951551949990525
> from => 0.9410880514065464
> vs => 0.8652299423273659
> year => 0.8469287549740471
> :SV-15944{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6406094746503062
> mln => 1.2174640910103491
> dlrs => 1.0937588768380255
> 3 => 1.0334146735611798
> pct => 0.9915028147402405
> reuter => 0.9900155118531778
> its => 0.9508279001565995
> from => 0.9403515526055797
> vs => 0.865439705916966
> year => 0.846318717539638
> :SV-15952{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.641608350634413
> mln => 1.2179827157677379
> dlrs => 1.094302484756082
> 3 => 1.033839606583586
> pct => 0.9921040410110572
> reuter => 0.990432219413613
> its => 0.9514099986904929
> from => 0.9409438763575203
> vs => 0.8652760331837802
> year => 0.8468099163160301
> :SV-15954{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6429205353451672
> mln => 1.2186434984636658
> dlrs => 1.0950054459143779
> 3 => 1.0343894404834142
> pct => 0.992893505149969
> reuter => 0.9909710261706427
> its => 0.9521740690117075
> from => 0.9417194634871013
> vs => 0.8650137662755684
> year => 0.8474476266423354
> :SV-16007{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.6401767760282457
> mln => 1.2172339691485916
> dlrs => 1.093520432998812
> 3 => 1.0332284013507513
> pct => 0.9912422858233993
> reuter => 0.9898327402827573
> its => 0.9505755879363272
> from => 0.9400942591120444
> vs => 0.8654979916098049
> year => 0.8461038772989482
> :SV-16037{n=36 c=[0:0.019, 0.003:0.001, 0.006913:0.001, 0.01:0.004,
> 0.02:0.002, 0.03:0.001, 0.046:0.0
> Top Terms:
> said => 1.640610618380475
> mln => 1.2174645746382695
> dlrs => 1.0937594396319776
> 3 => 1.0334151203058977
> pct => 0.9915035014016228
> reuter => 0.9900159476830741
> its => 0.9508285640147016
> from => 0.9403522136131415
> vs => 0.8654392679742507
> year => 0.846319234572972
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira