*classify-20newsgroups.sh*
*Complementary naive bayes:*
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 11207 98.9406%
Incorrectly Classified Instances : 120 1.0594%
Total Classified Instances : 11327
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
475 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0
0 | 478 a = alt.atheism
0 597 1 1 0 1 1 0 0 0
0 1 0 2 1 0 0 0 0
0 | 605 b = comp.graphics
0 1 620 3 0 1 0 0 0 0
0 1 0 0 1 0 0 0 0
0 | 627 c = comp.os.ms-windows.misc
1 1 1 593 2 0 0 0 0 0
0 0 0 0 0 1 0 0 0
0 | 599 d = comp.sys.ibm.pc.hardware
0 1 1 0 568 0 1 0 0 0
1 1 2 0 0 0 0 1 0
0 | 576 e = comp.sys.mac.hardware
0 4 2 0 0 581 0 0 0 0
0 0 0 0 0 0 0 0 0
0 | 587 f = comp.windows.x
0 0 0 1 2 0 571 3 0 0
1 1 4 1 0 0 0 0 0
0 | 584 g = misc.forsale
0 0 0 1 0 0 0 589 1 0
0 1 1 0 0 0 0 0 0
0 | 593 h = rec.autos
0 0 0 0 0 0 0 1 565 0
0 0 0 0 1 0 0 0 0
0 | 567 i = rec.motorcycles
0 0 0 0 0 0 0 0 0 600
2 0 0 0 1 0 0 0 0
0 | 603 j = rec.sport.baseball
0 0 0 0 0 0 0 0 0 1
584 0 0 0 0 0 0 0 0
0 | 585 k = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0
0 579 0 0 0 0 0 1 0
0 | 580 l = sci.crypt
0 0 0 1 3 0 2 0 0 2
0 0 567 1 2 1 0 0 0
0 | 579 m = sci.electronics
0 0 0 0 0 0 0 0 0 0
0 0 1 605 0 0 0 0 0
0 | 606 n = sci.med
0 0 0 0 0 0 0 0 0 0
0 0 0 0 602 0 0 0 0
0 | 602 o = sci.space
0 0 0 0 0 0 0 0 0 0
0 0 0 1 0 602 0 0 1
0 | 604 p = soc.religion.christian
0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 556 0 0
0 | 556 q = talk.politics.mideast
0 0 1 0 0 0 0 0 0 0
0 1 0 0 1 0 0 568 0
0 | 571 r = talk.politics.guns
11 0 0 0 0 0 0 0 0 1
0 0 0 1 3 8 1 4 338
2 | 369 s = talk.religion.misc
0 0 0 0 0 0 0 0 0 0
1 0 0 0 1 0 3 4 0
447 | 456 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.9806
Accuracy 98.9406%
Reliability 94.0932%
Reliability (standard deviation) 0.2163
Jan 21, 2014 6:37:28 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 15870 ms (Minutes: 0.2645)
+ echo 'Testing on holdout set'
Testing on holdout set
+ ./bin/mahout testnb -i /tmp/mahout-work-ec2-user/20news-test-vectors -m
/tmp/mahout-work-ec2-user/model -l /tmp/mahout-work-ec2-user/labelindex -ow
-o /tmp/mahout-work-ec2-user/20news-testing -c
[snip]
INFO: Complementary Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 6715 89.3071%
Incorrectly Classified Instances : 804 10.6929%
Total Classified Instances : 7519
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
298 0 0 0 0 0 0 0 0 1
0 0 0 1 2 5 1 0 13
0 | 321 a = alt.atheism
0 298 11 6 1 12 2 2 1 1
3 8 3 4 2 4 1 4 4
1 | 368 b = comp.graphics
1 17 286 16 4 9 6 3 2 0
1 0 1 7 1 0 2 1 0
1 | 358 c = comp.os.ms-windows.misc
2 6 11 309 9 5 14 8 1 0
2 0 6 4 2 0 1 2 1
0 | 383 d = comp.sys.ibm.pc.hardware
0 10 8 7 334 7 5 5 2 0
3 0 2 1 1 0 1 1 0
0 | 387 e = comp.sys.mac.hardware
1 13 7 8 2 355 2 0 2 0
0 5 1 1 3 0 0 1 0
0 | 401 f = comp.windows.x
0 7 11 29 12 9 268 16 8 4
3 2 6 4 2 1 3 1 2
3 | 391 g = misc.forsale
0 1 0 0 3 0 7 362 8 2
2 1 2 0 2 0 1 2 0
4 | 397 h = rec.autos
0 0 0 1 0 0 1 0 423 0
0 0 2 1 0 1 0 0 0
0 | 429 i = rec.motorcycles
0 0 1 0 0 0 0 2 2 371
8 0 2 3 0 2 0 0 0
0 | 391 j = rec.sport.baseball
0 0 1 0 0 0 1 0 0 2
409 0 0 0 0 0 0 0 0
1 | 414 k = rec.sport.hockey
0 0 1 2 1 0 1 0 0 0
0 404 0 0 0 0 0 1 0
1 | 411 l = sci.crypt
0 5 4 11 1 3 7 9 2 5
3 3 339 2 6 0 1 1 2
1 | 405 m = sci.electronics
0 4 0 1 0 0 0 1 0 1
1 0 3 367 3 1 2 0 0
0 | 384 n = sci.med
0 1 2 0 1 0 2 0 0 1
0 0 1 1 375 0 1 0 0
0 | 385 o = sci.space
4 2 1 1 0 0 1 1 2 0
0 1 1 5 1 367 4 0 1
1 | 393 p = soc.religion.christian
0 1 0 0 0 0 0 0 0 2
0 0 0 0 0 2 378 0 1
0 | 384 q = talk.politics.mideast
0 0 0 0 0 2 1 1 1 1
0 3 0 3 0 0 2 319 2
4 | 339 r = talk.politics.guns
32 0 0 1 0 0 0 0 0 1
1 1 0 2 2 26 5 7 175
6 | 259 s = talk.religion.misc
0 0 0 2 0 0 0 0 0 1
2 2 0 1 2 1 10 18 2
278 | 319 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8594
Accuracy 89.3071%
Reliability 84.611%
Reliability (standard deviation) 0.2148
Jan 21, 2014 6:37:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10840 ms (Minutes: 0.18066666666666667)
*Naive bayes:*
INFO: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 11286 99.0869%
Incorrectly Classified Instances : 104 0.9131%
Total Classified Instances : 11390
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
474 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 2
1 | 477 a = alt.atheism
0 566 0 2 0 1 0 0 0 0
0 0 0 0 0 0 0 0 0
0 | 569 b = comp.graphics
0 10 590 29 2 4 1 0 0 0
0 0 1 0 0 0 0 0 0
1 | 638 c = comp.os.ms-windows.misc
0 0 0 596 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 | 596 d = comp.sys.ibm.pc.hardware
0 0 0 0 575 0 1 0 0 0
0 0 1 0 0 0 0 0 0
0 | 577 e = comp.sys.mac.hardware
0 2 2 2 0 593 1 0 0 0
0 0 0 0 1 0 0 0 0
0 | 601 f = comp.windows.x
0 0 0 1 0 0 589 1 0 0
1 0 2 0 0 0 0 0 0
0 | 594 g = misc.forsale
0 0 0 0 0 0 0 594 0 0
0 0 0 0 0 0 0 0 0
0 | 594 h = rec.autos
0 0 0 0 0 0 0 0 611 0
0 0 0 0 0 0 0 0 0
0 | 611 i = rec.motorcycles
0 0 0 0 0 0 0 0 0 616
1 0 0 0 0 0 0 0 0
0 | 617 j = rec.sport.baseball
0 0 0 0 0 0 1 0 0 0
620 0 0 0 0 0 0 0 0
0 | 621 k = rec.sport.hockey
0 0 0 0 0 0 0 0 0 0
0 580 0 0 0 0 0 1 0
0 | 581 l = sci.crypt
0 0 0 3 1 0 0 0 0 0
0 0 571 0 0 0 0 0 0
0 | 575 m = sci.electronics
0 0 0 0 0 0 0 0 0 0
0 0 2 583 0 0 0 0 0
0 | 585 n = sci.med
0 0 0 0 0 0 0 0 0 0
0 0 0 1 599 0 0 0 0
0 | 600 o = sci.space
0 1 0 0 0 0 0 0 0 0
0 0 0 0 0 615 0 0 0
0 | 616 p = soc.religion.christian
1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 560 0 0
0 | 562 q = talk.politics.mideast
0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 0 0 548 0
1 | 551 r = talk.politics.guns
10 0 0 0 0 0 0 0 0 0
0 0 0 0 1 1 0 2 344
1 | 359 s = talk.religion.misc
0 0 0 0 0 0 0 0 0 0
0 1 1 0 0 0 0 2 0
462 | 466 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.9847
Accuracy 99.0869%
Reliability 94.3334%
Reliability (standard deviation) 0.2169
Jan 21, 2014 9:30:25 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 14304 ms (Minutes: 0.2384)
+ echo 'Testing on holdout set'
Testing on holdout set
[snip]
INFO: Standard NB Results:
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 6718 90.1019%
Incorrectly Classified Instances : 738 9.8981%
Total Classified Instances : 7456
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
294 0 0 0 0 0 0 0 0 0
0 2 0 1 1 6 1 1 16
0 | 322 a = alt.atheism
0 345 6 14 6 11 6 0 0 0
0 5 7 1 3 0 0 0 0
0 | 404 b = comp.graphics
2 29 177 78 22 19 9 1 0 0
0 4 2 0 1 1 0 0 1
1 | 347 c = comp.os.ms-windows.misc
1 9 2 335 18 2 10 0 0 0
1 0 8 0 0 0 0 0 0
0 | 386 d = comp.sys.ibm.pc.hardware
1 4 2 13 347 3 5 1 0 0
1 0 7 1 0 0 0 1 0
0 | 386 e = comp.sys.mac.hardware
0 20 0 4 0 352 4 0 0 0
0 0 1 1 3 0 1 0 1
0 | 387 f = comp.windows.x
0 2 0 21 5 1 323 7 2 2
0 2 12 0 3 0 0 0 0
1 | 381 g = misc.forsale
0 1 0 0 1 0 15 363 8 1
0 0 4 1 0 0 0 1 0
1 | 396 h = rec.autos
0 1 0 0 0 0 6 6 370 0
0 0 0 1 0 0 0 0 1
0 | 385 i = rec.motorcycles
1 0 0 1 1 0 2 1 2 362
5 0 2 0 0 0 0 0 0
0 | 377 j = rec.sport.baseball
0 0 0 1 2 0 0 0 0 3
371 0 0 0 0 0 0 0 0
1 | 378 k = rec.sport.hockey
0 3 1 0 1 0 2 0 0 0
0 396 0 1 0 0 1 1 1
3 | 410 l = sci.crypt
0 7 0 7 7 2 6 4 0 0
0 1 369 2 2 0 0 0 0
2 | 409 m = sci.electronics
0 3 0 2 1 0 2 0 0 0
0 1 4 383 4 0 0 1 0
4 | 405 n = sci.med
0 5 0 0 1 0 3 0 0 0
0 0 1 0 374 1 0 0 1
1 | 387 o = sci.space
6 2 0 1 1 0 0 1 0 1
0 0 1 5 0 352 2 1 7
1 | 381 p = soc.religion.christian
1 1 0 0 0 0 0 0 0 0
1 0 0 0 0 0 373 1 0
1 | 378 q = talk.politics.mideast
0 0 0 0 0 0 1 0 1 0
0 2 0 0 0 0 0 346 2
7 | 359 r = talk.politics.guns
26 1 0 1 0 0 0 2 0 1
1 0 0 1 1 20 2 6 200
7 | 269 s = talk.religion.misc
1 0 0 0 0 0 0 2 0 0
1 0 0 2 2 0 1 14 0
286 | 309 t = talk.politics.misc
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.8726
Accuracy 90.1019%
Reliability 85.4491%
Reliability (standard deviation) 0.2222
Jan 21, 2014 9:30:37 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10878 ms (Minutes: 0.1813)
*SGD:*
7532 test files
=======================================================
Summary
-------------------------------------------------------
Correctly Classified Instances : 5649 75%
Incorrectly Classified Instances : 1883 25%
Total Classified Instances : 7532
=======================================================
Confusion Matrix
-------------------------------------------------------
a b c d e f g h i j
k l m n o p q r s
t <--Classified as
186 6 3 10 5 0 33 4 13 15
7 1 24 15 3 15 5 5 29
15 | 394 a = sci.space
5 309 0 3 2 5 0 0 0 1
9 21 2 0 0 18 4 4 1
1 | 385 b = comp.sys.mac.hardware
4 1 101 3 0 1 63 0 7 0
1 1 5 16 3 0 3 7 1
34 | 251 c = talk.religion.misc
11 12 1 265 1 10 3 0 0 17
10 11 5 2 0 11 3 6 21
0 | 389 d = comp.graphics
2 1 1 0 349 2 3 0 3 2
6 1 5 1 0 2 15 2 1
2 | 398 e = rec.motorcycles
7 20 3 19 2 254 6 0 2 11
2 39 7 2 0 4 2 2 9
3 | 394 f = comp.os.ms-windows.misc
2 1 13 0 0 0 247 0 1 1
3 0 6 2 4 0 2 3 5
29 | 319 g = alt.atheism
1 1 0 0 2 0 2 361 0 1
2 0 2 0 0 1 3 22 0
1 | 399 h = rec.sport.hockey
3 0 3 1 0 0 5 0 161 0
1 2 12 102 0 0 1 2 11
6 | 310 i = talk.politics.misc
2 8 0 19 0 19 0 0 1 294
10 11 4 2 0 5 0 3 11
6 | 395 j = comp.windows.x
2 10 0 1 1 0 0 0 0 1
347 13 2 1 0 5 3 2 2
0 | 390 k = misc.forsale
1 36 0 6 1 25 0 0 1 6
10 257 2 1 0 34 6 0 6
0 | 392 l = comp.sys.ibm.pc.hardware
2 2 2 2 1 0 12 0 0 6
10 4 312 5 2 13 11 3 3
6 | 396 m = sci.med
2 0 3 2 1 0 0 1 13 0
5 1 2 314 2 0 2 2 10
4 | 364 n = talk.politics.guns
1 0 2 1 1 0 34 1 33 1
3 0 1 8 271 1 4 5 6
3 | 376 o = talk.politics.mideast
3 14 0 8 2 8 3 1 1 7
12 29 6 2 1 245 13 2 32
4 | 393 p = sci.electronics
3 3 0 2 11 0 1 0 2 1
11 6 4 2 0 11 330 4 4
1 | 396 q = rec.autos
0 0 1 0 1 0 4 12 3 1
3 0 0 0 0 5 6 359 1
1 | 397 r = rec.sport.baseball
0 1 0 0 0 1 0 0 3 3
0 0 3 2 1 6 1 6 366
3 | 396 s = sci.crypt
0 2 11 1 1 0 40 0 1 2
3 4 2 1 0 5 0 2 2
321 | 398 t = soc.religion.christian
=======================================================
Statistics
-------------------------------------------------------
Kappa 0.7073
Accuracy 75%
Reliability 70.6238%
Reliability (standard deviation) 0.2187
Log-likelihood mean : -1.1182
25%-ile : -1.6911
75%-ile : -0.0803
Jan 21, 2014 9:46:39 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Program took 10783 ms (Minutes: 0.17971666666666666)
On Tue, Jan 21, 2014 at 1:08 PM, Suneel Marthi <[email protected]>wrote:
> Thanks Andrew for reporting that. I rolled back the release to fix this
> and few other issues.
>
> We have removed asf-examples*.sh from trunk as the sample file at the url
> mentioned in ur email is not available.
> This is something we need to fix and restore in 1.0.
>
>
>
>
>
>
>
> On Tuesday, January 21, 2014 3:21 PM, Andrew Palumbo <[email protected]>
> wrote:
>
> from the asf-email-examples.sh script:
>
> # You will need to download or otherwise obtain some or all of the Amazon
> ASF Em
> ail Public Dataset (http://aws.amazon.com/datasets/7791434387204566) to
> use this
> script.
> # To obtain a full copy you will need to launch an EC2 instance and mount
> the da
> taset to download it, otherwise you can get a sample of it at
> #
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>
> It looks like the:
> http://www.lucidimagination.com/devzone/technical-articles/scaling-mahout
>
> link is down.
>
> Is there somewhere else that we can get a subset of the ASF emails?
>
>
>
> Date: Tue, 21 Jan 2014 09:48:06 -0800
> > Subject: Re: MAHOUT 0.9 Release - New URL
> > From: [email protected]
> > To: [email protected]
> >
> > Sure thing; continuing to smoke test the other examples tonight
> >
> >
> > On Tue, Jan 21, 2014 at 9:23 AM, Suneel Marthi <[email protected]
> >wrote:
> >
> > > Thanks Andrew M., see that some of the example scripts need to be
> fixed as
> > > they still refer to the deprecated algorithms.
> > > See that the Streaming KMeans has failed for you as well.
> > >
> > > I'll be rolling back the release today to fix these issues.
> > >
> > >
> > >
> > >
> > >
> > > On Tuesday, January 21, 2014 1:22 AM, Andrew Musselman <
> > > [email protected]> wrote:
> > >
> > > Builds on Ubuntu 12.04 from tarball and zip, and on AWS's default
> 64-bit
> > > Linux AMI from tarball.
> > >
> > > All tests pass.
> > >
> > > *Output of examples:*
> > > *asf-email-examples.sh, run on mahout.apache.org
> > > <http://mahout.apache.org>:*
> > > *recommendations:*
> > > [ec2-user@ip-10-73-146-199 bin]$ hadoop fs -cat
> > > /user/ec2-user/asf-output/prefs/recommendations/part-r-00000 | less
> > > 1
> > >
> > >
> [21935:1.0,23122:1.0,24084:1.0,26397:1.0,1755:1.0,20743:1.0,13428:1.0,19483:1.0,24067:1.0]
> > > 4
> > >
> > >
> [14372:1.0,28069:1.0,12258:1.0,18412:1.0,26707:1.0,14610:1.0,2909:1.0,14777:1.0,11792:1.0,26764:1.0]
> > > 6
> > >
> > >
> [5442:1.0,18416:1.0,17554:1.0,14610:1.0,16767:1.0,16740:1.0,26743:1.0,11792:1.0,26707:1.0,28116:1.0]
> > > 8
> > > [12758:1.0,19409:1.0,11112:1.0]
> > > 11
> > >
> > >
> [25890:1.0,26743:1.0,9122:1.0,14512:1.0,28116:1.0,17499:1.0,14976:1.0,14561:1.0,3686:1.0,26707:1.0]
> > > 14
> > >
> > >
> [29596:1.0,25567:1.0,19520:1.0,26327:1.0,13809:1.0,29435:1.0,17331:1.0,17290:1.0,17819:1.0,3829:1.0]
> > > 15
> > >
> > >
> [15355:1.0,15322:1.0,23191:1.0,7990:1.0,15318:1.0,15236:1.0,17789:1.0,15286:1.0,20916:1.0,2812:1.0]
> > > 16
> > >
> > >
> [23647:1.0,18137:1.0,1692:1.0,11490:1.0,4303:1.0,12906:1.0,5120:1.0,29503:1.0,19409:1.0,27700:1.0]
> > > 18
> > >
> > >
> [29738:1.0,12070:1.0,24078:1.0,19449:1.0,17819:1.0,11549:1.0,25410:1.0,15228:1.0,24930:1.0,23708:1.0]
> > > 19 [28008:1.0,18416:1.0,2909:1.0,29250:1.0,28023:1.0,14974:1.0]
> > > 20
> > >
> > >
> [19313:1.0,3464:1.0,12394:1.0,18665:1.0,16601:1.0,25816:1.0,10212:1.0,11626:1.0,18577:1.0,16734:1.0]
> > > [snip]
> > >
> > > *clustering; kmeans:*
> > > [snip]
> > > Weight : [props - optional]: Point:
> > > 1.0 :
> > > [distance-squared=1.0193102046188427]:
> > > /commits/200802.gz/20835820.1202052180347.JavaMail.www-data@brutus =
> > > [1065:0.195, 1977:0.355, 2246:0.091, 3008:0.078, 5336:0.110,
> 7573:0.204,
> > > 7683:0.126, 7715:0.365, 7812:0.180, 7832:0.075, 8268:0.093, 9779:0.159,
> > > 10257:0.133, 10972:0.158, 11663:0.143, 15313:0.065, 17007:0.244,
> > > 19359:0.183, 19399:0.338, 19525:0.139, 20224:0.140, 24649:0.095,
> > > 25003:0.076, 29143:0.156, 30459:0.075, 31537:0.156, 31559:0.075,
> > > 31668:0.139, 33208:0.117, 33425:0.218, 36491:0.075, 38378:0.130,
> > > 39789:0.110, 40743:0.190, 45775:0.086]
> > > 1.0 : [distance-squared=0.9823018320457279]:
> > > /commits/200808.gz/1722278226.1219149603005.JavaMail.www-data@brutus =
> > > [1065:0.188, 2246:0.088, 3008:0.076, 3620:0.239, 5200:0.104,
> 5336:0.106,
> > > 6404:0.088, 7552:0.335, 7683:0.122, 7715:0.376, 7812:0.173, 7832:0.072,
> > > 10257:0.128, 11663:0.195, 15313:0.063, 16660:0.094, 19359:0.177,
> > > 19525:0.134, 19551:0.101, 20025:0.183, 21233:0.098, 24649:0.092,
> > > 25003:0.112, 27650:0.283, 27653:0.216, 29143:0.150, 30459:0.072,
> > > 30868:0.208, 31559:0.126, 31565:0.203, 33208:0.113, 36491:0.073,
> > > 36610:0.141, 36767:0.208, 38378:0.125, 39789:0.106, 45775:0.083]
> > > 1.0 : [distance-squared=0.9509142993214911]:
> > > /commits/201006.gz/5844140.863.1277658000780.JavaMail.confluence@thor=
> > > [648:0.100, 914:0.066, 2040:0.076, 2246:0.078, 3008:0.048,
> > > 4419:0.076,
> > > 4452:0.070, 5200:0.065, 5203:0.140, 5336:0.067, 6404:0.056, 7235:0.048,
> > > 7310:0.077, 7464:0.067, 7471:0.060, 7489:0.093, 7505:0.123, 7683:0.077,
> > > 7715:0.145, 7814:0.072, 7912:0.155, 8268:0.098, 9835:0.118,
> 10225:0.081,
> > > 10257:0.114, 11127:0.112, 11510:0.086, 11589:0.139, 11663:0.087,
> > > 12641:0.117, 13837:0.052, 14030:0.062, 14089:0.051, 14352:0.061,
> > > 14396:0.185, 17015:0.115, 17240:0.097, 18767:0.149, 19774:0.124,
> > > 20346:0.159, 21233:0.075, 23657:0.089, 23939:0.078, 23974:0.105,
> > > 23998:0.146, 24962:0.122, 25003:0.093, 25084:0.151, 25128:0.052,
> > > 29143:0.095, 30459:0.046, 30806:0.075, 31559:0.046, 31727:0.104,
> > > 31895:0.105, 31900:0.153, 32149:0.079, 32993:0.069, 33112:0.177,
> > > 33208:0.101, 33351:0.089, 33533:0.079, 33638:0.042, 35795:0.066,
> > > 36189:0.078, 36491:0.046, 36500:0.093, 36625:0.200, 37111:0.071,
> > > 39336:0.079, 39789:0.067, 39933:0.073, 39967:0.079, 41155:0.167,
> > > 41280:0.065, 41696:0.072, 41947:0.118,
> > > 43685:0.086, 44077:0.308,
> > > 44353:0.215, 44423:0.085, 45215:0.151, 45775:0.052, 46766:0.074,
> > > 47823:0.082, 48120:0.080, 48212:0.109, 48436:0.110]
> > > [snip]
> > >
> > > *clustering; dirichlet:*
> > > Get this complaint:
> > > Running Dirichlet with K = 8
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: Unable to add class:
> dirichlet
> > > 14/01/21 05:16:35 WARN driver.MahoutDriver: No dirichlet.props found on
> > > classpath, will use command-line arguments only
> > > Unknown program 'dirichlet' chosen.
> > >
> > > *clustering: minhash:*
> > > Running Minhash
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:17:27 WARN
> > > driver.MahoutDriver: Unable to add class: minhash
> > > 14/01/21 05:17:27 WARN driver.MahoutDriver: No minhash.props found on
> > > classpath, will use command-line arguments only
> > > Unknown program 'minhash' chosen.
> > >
> > > *classification; standard:*
> > > =======================================================
> > > Summary
> > > -------------------------------------------------------
> > > Correctly Classified Instances : 5384 87.7874%
> > > Incorrectly Classified Instances : 749 12.2126%
> > > Total Classified Instances : 6133
> > >
> > > =======================================================
> > > Confusion Matrix
> > > -------------------------------------------------------
> > > a b c d
> > > <--Classified as
> > > 2949 7 531 25 | 3512 a = dev
> > > 0 0 0 0 | 0 b = general
> > > 99 8 1763 8 | 1878 c = user
> > > 41 1 29 672 | 743 d = commits
> > >
> > > =======================================================
> > > Statistics
> > > -------------------------------------------------------
> > > Kappa
> > > 0.7877
> > > Accuracy 87.7874%
> > > Reliability 53.658%
> > > Reliability (standard deviation) 0.4911
> > >
> > > *classification; complementary:*
> > > =======================================================
> > > Summary
> > > -------------------------------------------------------
> > > Correctly Classified Instances : 5530 90.1679%
> > > Incorrectly Classified Instances : 603 9.8321%
> > > Total Classified Instances :
> > > 6133
> > >
> > > =======================================================
> > > Confusion Matrix
> > > -------------------------------------------------------
> > > a b c d <--Classified as
> > > 3168 0 276 68 | 3512 a = dev
> > > 0 0 0 0 | 0 b = general
> > > 196 0 1652 30 | 1878 c = user
> > > 25 0 8 710 | 743 d =
> > > commits
> > >
> > > =======================================================
> > > Statistics
> > > -------------------------------------------------------
> > > Kappa 0.8259
> > > Accuracy 90.1679%
> > > Reliability 54.7459%
> > > Reliability (standard deviation) 0.5005
> > >
> > > 14/01/21 05:28:42 INFO driver.MahoutDriver: Program took 20901 ms
> (Minutes:
> > > 0.34836666666666666)
> > >
> > > *classification; sgd, with three categories:*
> > > Running SGD Training
> > > Running on hadoop, using /home/ec2-user/hadoop-1.2.1/bin/hadoop
> > > and
> > > HADOOP_CONF_DIR=
> > > MAHOUT-JOB:
> > >
> > >
> /home/ec2-user/mahout-distribution-0.9/examples/target/mahout-examples-0.9-job.jar
> > > 14/01/21 05:58:00 WARN driver.MahoutDriver: No
> > > org.apache.mahout.classifier.sgd.TrainASFEmail.props found on
> classpath,
> > > will use command-line arguments only
> > > 14/01/21 05:58:00 INFO common.AbstractJob: Command line arguments:
> > > {--cardinality=[100000], --categories=[3], --endPhase=[2147483647],
> > > --input=[asf-output/classification/sgd/splits/mapRedOut/],
> > > --output=[asf-output/classification/sgd/models], --poolSize=[5],
> > > --startPhase=[0], --tempDir=[temp], --threads=[20]}
> > > 24168 training files
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 1
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000
> > > 2
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 3
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 4
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 6
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 8
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 10
> > > 0.000 0.00 none
> > > 0.00 0.00
> > > 0.00 0.00 0.0000000 0.0000000 12
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 15
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 20
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 25
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 30
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000
> > > 0.0000000 40
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 50
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 60
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 70
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 80
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 100
> > > 0.000
> > > 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 120
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 140
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 150
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 200
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 250
> > > 0.000 0.00 none
> > > 0.00 0.00
> > > 0.00 0.00 0.0000000 0.0000000 300
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 400
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 500
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 600
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000 0.0000000 700
> > > 0.000 0.00 none
> > > 0.00 0.00 0.00 0.00 0.0000000
> > > 0.0000000 800
> > > 0.000 0.00 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1000 -0.607 75.78 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1200 -0.607 75.78 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1400 -0.607 75.78 none
> > > 0.13 32659.00 12672.00 82.50 1.3512194e-08
> > > 1.0019413e-08 1500 -0.607 75.78 none
> > > 0.24 43686.00 17924.00 329.50
> > > 1.0571799e-08
> > > 1.0032261e-08 2000 -0.487 82.65 none
> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > 1.0011902e-08 2500 -0.439 83.90 none
> > > 0.24 49753.00 21610.00 330.71 1.3770070e-08
> > > 1.0011902e-08 3000 -0.439 83.90 none
> > > 0.32 50635.00 28531.00 437.09 1.0551175e-08
> > > 1.0000001e-08 4000 -0.351 88.14 none
> > > 0.32 50635.00 32642.00 437.09 1.0551175e-08
> > > 1.0000000e-08 5000 -0.378 87.10 none
> > > 0.32 50635.00 36461.00 437.09
> > > 1.0556652e-08
> > > 1.0000001e-08 6000 -0.372 86.89 none
> > > 0.32 50635.00 37768.00 437.09 1.0576742e-08
> > > 1.0000001e-08 7000 -0.334 89.26 none
> > > 0.32 50635.00 38807.00 437.09 1.0576742e-08
> > > 1.0000000e-08 8000 -0.368 87.52 none
> > > 0.32 50635.00 44731.00 437.09 1.0576716e-08
> > > 1.0000000e-08 10000 -0.374 87.39 none
> > > 0.32 50635.00 45672.00 437.09 1.0576716e-08
> > > 1.0000000e-08 12000 -0.298 88.26 none
> > > Exception in thread "main" java.lang.IllegalStateException:
> > > java.lang.ArrayIndexOutOfBoundsException:
> > > 2
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.trainWithBufferedExamples(AdaptiveLogisticRegression.java:175)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:147)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression.train(AdaptiveLogisticRegression.java:132)
> > > at
> > >
> org.apache.mahout.classifier.sgd.TrainASFEmail.run(TrainASFEmail.java:109)
> > > at
> > >
> org.apache.mahout.classifier.sgd.TrainASFEmail.main(TrainASFEmail.java:142)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > >
> > > at
> > >
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > at
> > >
> > >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> > > at
> > > org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
> > > at
> > > org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > >
> > >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > at
> > >
> > >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:622)
> > > at
> > > org.apache.hadoop.util.RunJar.main(RunJar.java:160)
> > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
> > > at
> > > org.apache.mahout.math.DenseVector.setQuick(DenseVector.java:141)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.DefaultGradient.apply(DefaultGradient.java:44)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AbstractOnlineLogisticRegression.train(AbstractOnlineLogisticRegression.java:167)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.CrossFoldLearner.train(CrossFoldLearner.java:137)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$Wrapper.train(AdaptiveLogisticRegression.java:444)
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:158)
> > >
> > > at
> > >
> > >
> org.apache.mahout.classifier.sgd.AdaptiveLogisticRegression$1.apply(AdaptiveLogisticRegression.java:153)
> > > at
> > >
> > >
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:148)
> > > at
> > >
> > >
> org.apache.mahout.ep.EvolutionaryProcess$1.call(EvolutionaryProcess.java:145)
> > > at
> > > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
> > > at
> > >
> > >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:701)
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Jan 20, 2014 at 9:37 AM, Andrew Musselman <
> > > [email protected]> wrote:
> > >
> > > > Trying out the build today
> > > >
> > > >
> > > > On Mon, Jan 20, 2014 at 6:00 AM, Suneel Marthi <
> [email protected]
> > > >wrote:
> > > >
> > > >> This is an issue (trivial one though) that needs to be fixed for 0.9
> > > >> Release, will be rerolling the release today (in the next few hrs)
> and
> > > >> putting out a new release candidate in staging.
> > > >>
> > > >> Thanks for reporting this Andrew P.
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Monday, January 20, 2014 12:34 AM, Andrew Palumbo <
> > > [email protected]>
> > > >> wrote:
> > > >>
> > > >> I ran through the tests with on a CentOS VM
> > > AMD64 2 cores 4 GB RAM. Had
> > > >> a bit of trouble getting the Hadoop natives to compile and
> therefore may
> > > >> have run into some problems because of the hadoop setup. Ran into
> some
> > > >> problems in the example scripts. Particularly with
> > > >> ./cluster-syntheticcontrol.sh ->4,5. I will run through the rest
> of the
> > > >> examples when im sure I've got hadoop setup right.
> > > >>
> > > >>
> > > >> Apache Maven 3.1.2-SNAPSHOT
> > > >> Java version: 1.6.0_45, vendor: Sun Microsystems Inc.
> > > >> Java home: /usr/java/jdk1.6.0_45/jre
> > > >> OS name: "linux", version: "2.6.32-358.23.2.el6.x86_64", arch:
> "amd64",
> > > >> family: "unix"
> > > >> $MAHOUT_LOCAL=true
> > > >> Hadoop 2.2.0
> > > >>
> > > >>
> > > >> a) Verify that u can unpack the release (tar or zip) ...passed (tar)
> > > >> [passed ]
> > > >>
> > > >> b) Verify u r able to compile the
> > > distro
> > > >>
> > > >> mvn compile- [passed with warnings]
> > > >>
> > > >> [WARNING] Expected all dependencies to require Scala version:
> 2.9.3
> > > >> [WARNING] org.apache.mahout:mahout-math-scala:0.9 requires
> scala
> > > >> version: 2.9.3
> > > >> [WARNING] org.scalatest:scalatest_2.9.2:1.9.1 requires scala
> > > >> version: 2.9.2
> > > >> [WARNING] Multiple versions of scala libraries detected!
> > > >>
> > > >> c) Run through the unit tests: mvn clean test
> > > >> mvn clean test [passed]
> > > >>
> > > >> d) Run the
> > > >> example scripts under $MAHOUT_HOME/examples/bin.
> > > >> Please run through all the different options in each script
> > > >>
> > > >> Running example scripts with $MAHOUT_LOCAL=true
> > > >>
> > > >>
> > > ./cluster-syntheticcontrol.sh ->1 [works]
> > > >> ./cluster-syntheticcontrol.sh ->2 [works]
> > > >> ./cluster-syntheticcontrol.sh ->3 [works]
> > > >>
> > > >>
> > > >> ./cluster-syntheticcontrol.sh ->4 [exits, throws exception]
> > > >> [...]
> > > >> WARNING: Unable to add class:
> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >> java.lang.ClassNotFoundException:
> > > >> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
> > > >> at
> > > >> java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >> at java.security.AccessController.doPrivileged(Native
> Method)
> > > >> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >> at
> > > java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >> at java.lang.Class.forName0(Native Method)
> > > >> at java.lang.Class.forName(Class.java:171)
> > > >> at
> > > >>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >> at
> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >> Jan 19, 2014 7:55:31 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > >>
> > > >>
> > > >> ./cluster-syntheticcontrol.sh ->5 [exits, throws exception]
> > > >>
> > > >> WARNING: Unable to add class:
> > > >>
> > > org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >> java.lang.ClassNotFoundException:
> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
> > > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >> at java.security.AccessController.doPrivileged(Native
> Method)
> > > >> at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >> at
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >> at java.lang.Class.forName0(Native Method)
> > > >> at
> > > java.lang.Class.forName(Class.java:171)
> > > >> at
> > > >>
> org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > > >> at
> > > >> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:128)
> > > >> Jan 19, 2014 7:59:51 PM org.slf4j.impl.JCLLoggerAdapter warn
> > > >> WARNING: No
> > > >> org.apache.mahout.clustering.syntheticcontrol.meanshift.Job.props
> found
> > > on
> > > >> classpath, will use command-line arguments only
> > > >> Unknown program
> > > >> 'org.apache.mahout.clustering.syntheticcontrol.meanshift.Job'
> chosen.
> > > >>
> > > >>
> > > >> ./classify-20newsgroups.sh ->1 [works]
> > > >> ./classify-20newsgroups.sh ->2 [works]
> > > >>
> > > >>
> > > >> cluster-reuters.sh ->1 [works]
> > > >>
> > > cluster-reuters.sh ->2 [works]
> > > >> cluster-reuters.sh ->3 [works]
> > > >>
> > > >> Same error as noted previosly in the thread:
> > > >>
> > > >> cluster-reuters.sh ->4 [0 clusters]
> > > >>
> > > >> [...]
> > > >>
> > > >> WARNING: No qualcluster.props found on classpath, will use
> > > >> command-line arguments only
> > > >> Num clusters: 0; maxDistance: 0.000000
> > > >> [Dunn Index]
> > > >> First: Infinity
> > > >> [Davies-Bouldin Index] First: NaN
> > > >> Jan 19, 2014 7:13:57 PM org.slf4j.impl.JCLLoggerAdapter info
> > > >> INFO: Program took 669 ms (Minutes: 0.01115)
> > > >> cluster,distance.mean,distance.sd
> > > >>
> > >
> > >
> ,distance.q0,distance.q1,distance.q2,distance.q3,distance.q4,count,is.train
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> > Date: Thu, 16 Jan 2014 06:41:09 -0800
> > > >> > From: [email protected]
> > > >> > Subject: MAHOUT 0.9 Release - New URL
> > > >> > To: [email protected]; [email protected]
> > > >> >
> > > >> > Third time's a Charm!!!
> > > >> >
> > > >> >
> > > >> > Here's the new URL for Mahout 0.9 Release:
> > > >> >
> > > >>
> > >
> https://repository.apache.org/content/repositories/orgapachemahout-1002/org/apache/mahout/mahout-distribution/0.9/
> > > >> >
> > > >> > For those volunteering to test this, some of the things to be
> > > verified:
> > > >> >
> > > >> > a) Verify that u can unpack the release (tar or zip)
> > > >> > b) Verify u r able to compile the distro
> > > >> > c) Run through the unit tests: mvn clean test
> > > >> > d) Run the example scripts
> > > >> under $MAHOUT_HOME/examples/bin. Please run through all the
> different
> > > >> options in each script.
> > > >> >
> > > >> >
> > > >> > Committers
> > > >> > and PMC members:
> > > >> > ---------------------------------------
> > > >> >
> > > >> > Need 'at least 3 +1 votes' for the Release to pass.
> > > >> >
> > > >> >
> > > >> > Thanks and
> > > Regards.
> > > >>
> > > >
> > > >
> > >
>