See <https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/152/changes>
Changes: [ssc] MAHOUT-986 Remove old LDA implementation from codebase [robinanil] MAHOUT-1006 making end to end example work [gsingers] MAHOUT-1023: make sure getConf gets a real conf ------------------------------------------ [...truncated 7280 lines...] 12/06/03 19:22:24 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680 12/06/03 19:22:24 INFO mapred.MapTask: Finished spill 0 12/06/03 19:22:25 INFO mapred.MapTask: Starting flush of map output 12/06/03 19:22:25 INFO mapred.MapTask: Finished spill 1 12/06/03 19:22:25 INFO mapred.Merger: Merging 2 sorted segments 12/06/03 19:22:25 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 880434 bytes 12/06/03 19:22:25 INFO mapred.Task: Task:attempt_local_0002_m_000002_0 is done. And is in the process of commiting 12/06/03 19:22:27 INFO mapred.LocalJobRunner: 12/06/03 19:22:27 INFO mapred.Task: Task 'attempt_local_0002_m_000002_0' done. 12/06/03 19:22:27 INFO mapred.MapTask: io.sort.mb = 100 12/06/03 19:22:27 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/03 19:22:27 INFO mapred.MapTask: record buffer = 262144/327680 12/06/03 19:22:27 INFO mapred.MapTask: Spilling map output: record full = true 12/06/03 19:22:27 INFO mapred.MapTask: bufstart = 0; bufend = 3828741; bufvoid = 99614720 12/06/03 19:22:27 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 327680 12/06/03 19:22:27 INFO mapred.MapTask: Starting flush of map output 12/06/03 19:22:27 INFO mapred.MapTask: Finished spill 0 12/06/03 19:22:28 INFO mapred.MapTask: Finished spill 1 12/06/03 19:22:28 INFO mapred.Merger: Merging 2 sorted segments 12/06/03 19:22:28 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 724150 bytes 12/06/03 19:22:28 INFO mapred.Task: Task:attempt_local_0002_m_000003_0 is done. And is in the process of commiting 12/06/03 19:22:30 INFO mapred.LocalJobRunner: 12/06/03 19:22:30 INFO mapred.Task: Task 'attempt_local_0002_m_000003_0' done. 12/06/03 19:22:30 INFO mapred.LocalJobRunner: 12/06/03 19:22:30 INFO mapred.Merger: Merging 4 sorted segments 12/06/03 19:22:30 INFO mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 3360542 bytes 12/06/03 19:22:30 INFO mapred.LocalJobRunner: 12/06/03 19:22:30 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. And is in the process of commiting 12/06/03 19:22:30 INFO mapred.LocalJobRunner: 12/06/03 19:22:30 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is allowed to commit now 12/06/03 19:22:30 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0002_r_000000_0' to /tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/wordcount 12/06/03 19:22:33 INFO mapred.LocalJobRunner: reduce > reduce 12/06/03 19:22:33 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done. 12/06/03 19:22:34 INFO mapred.JobClient: map 100% reduce 100% 12/06/03 19:22:34 INFO mapred.JobClient: Job complete: job_local_0002 12/06/03 19:22:34 INFO mapred.JobClient: Counters: 16 12/06/03 19:22:34 INFO mapred.JobClient: File Output Format Counters 12/06/03 19:22:34 INFO mapred.JobClient: Bytes Written=1000743 12/06/03 19:22:34 INFO mapred.JobClient: FileSystemCounters 12/06/03 19:22:34 INFO mapred.JobClient: FILE_BYTES_READ=385720347 12/06/03 19:22:34 INFO mapred.JobClient: FILE_BYTES_WRITTEN=326116531 12/06/03 19:22:34 INFO mapred.JobClient: File Input Format Counters 12/06/03 19:22:34 INFO mapred.JobClient: Bytes Read=15194231 12/06/03 19:22:34 INFO mapred.JobClient: Map-Reduce Framework 12/06/03 19:22:34 INFO mapred.JobClient: Reduce input groups=73413 12/06/03 19:22:34 INFO mapred.JobClient: Map output materialized bytes=3360558 12/06/03 19:22:34 INFO mapred.JobClient: Combine output records=190745 12/06/03 19:22:34 INFO mapred.JobClient: Map input records=21578 12/06/03 19:22:34 INFO mapred.JobClient: Reduce shuffle bytes=0 12/06/03 19:22:34 INFO mapred.JobClient: Reduce output records=41807 12/06/03 19:22:34 INFO mapred.JobClient: Spilled Records=572235 12/06/03 19:22:34 INFO mapred.JobClient: Map output bytes=22501851 12/06/03 19:22:34 INFO mapred.JobClient: Combine input records=1540960 12/06/03 19:22:34 INFO mapred.JobClient: Map output records=1540960 12/06/03 19:22:34 INFO mapred.JobClient: SPLIT_RAW_BYTES=628 12/06/03 19:22:34 INFO mapred.JobClient: Reduce input records=190745 12/06/03 19:22:34 INFO input.FileInputFormat: Total input paths to process : 4 12/06/03 19:22:34 INFO filecache.TrackerDistributedCacheManager: Creating dictionary.file-0 in /tmp/hadoop-jenkins/mapred/local/archive/-6328907927761479619_-1970001630_869041919/file/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda-work-5627208055286448987 with rwxr-xr-x 12/06/03 19:22:34 INFO filecache.TrackerDistributedCacheManager: Cached /tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0 as /tmp/hadoop-jenkins/mapred/local/archive/-6328907927761479619_-1970001630_869041919/file/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0 12/06/03 19:22:34 INFO filecache.TrackerDistributedCacheManager: Cached /tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0 as /tmp/hadoop-jenkins/mapred/local/archive/-6328907927761479619_-1970001630_869041919/file/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0 12/06/03 19:22:34 INFO mapred.JobClient: Running job: job_local_0003 12/06/03 19:22:34 INFO mapred.MapTask: io.sort.mb = 100 12/06/03 19:22:34 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/03 19:22:34 INFO mapred.MapTask: record buffer = 262144/327680 12/06/03 19:22:35 INFO mapred.MapTask: Starting flush of map output 12/06/03 19:22:35 INFO mapred.MapTask: Finished spill 0 12/06/03 19:22:35 INFO mapred.Task: Task:attempt_local_0003_m_000000_0 is done. And is in the process of commiting 12/06/03 19:22:35 INFO mapred.JobClient: map 0% reduce 0% 12/06/03 19:22:37 INFO mapred.LocalJobRunner: 12/06/03 19:22:37 INFO mapred.Task: Task 'attempt_local_0003_m_000000_0' done. 12/06/03 19:22:37 INFO mapred.MapTask: io.sort.mb = 100 12/06/03 19:22:37 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/03 19:22:37 INFO mapred.MapTask: record buffer = 262144/327680 12/06/03 19:22:38 INFO mapred.MapTask: Starting flush of map output 12/06/03 19:22:38 INFO mapred.MapTask: Finished spill 0 12/06/03 19:22:38 INFO mapred.Task: Task:attempt_local_0003_m_000001_0 is done. And is in the process of commiting 12/06/03 19:22:38 INFO mapred.JobClient: map 100% reduce 0% 12/06/03 19:22:40 INFO mapred.LocalJobRunner: 12/06/03 19:22:40 INFO mapred.Task: Task 'attempt_local_0003_m_000001_0' done. 12/06/03 19:22:40 INFO mapred.MapTask: io.sort.mb = 100 12/06/03 19:22:40 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/03 19:22:40 INFO mapred.MapTask: record buffer = 262144/327680 12/06/03 19:22:41 INFO mapred.MapTask: Starting flush of map output 12/06/03 19:22:41 INFO mapred.MapTask: Finished spill 0 12/06/03 19:22:41 INFO mapred.Task: Task:attempt_local_0003_m_000002_0 is done. And is in the process of commiting 12/06/03 19:22:43 INFO mapred.LocalJobRunner: 12/06/03 19:22:43 INFO mapred.Task: Task 'attempt_local_0003_m_000002_0' done. 12/06/03 19:22:43 INFO mapred.MapTask: io.sort.mb = 100 12/06/03 19:22:43 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/03 19:22:43 INFO mapred.MapTask: record buffer = 262144/327680 12/06/03 19:22:44 INFO mapred.MapTask: Starting flush of map output 12/06/03 19:22:44 INFO mapred.MapTask: Finished spill 0 12/06/03 19:22:44 INFO mapred.Task: Task:attempt_local_0003_m_000003_0 is done. And is in the process of commiting 12/06/03 19:22:46 INFO mapred.LocalJobRunner: 12/06/03 19:22:46 INFO mapred.Task: Task 'attempt_local_0003_m_000003_0' done. 12/06/03 19:22:46 INFO mapred.LocalJobRunner: 12/06/03 19:22:46 INFO mapred.Merger: Merging 4 sorted segments 12/06/03 19:22:46 INFO mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 14870368 bytes 12/06/03 19:22:46 INFO mapred.LocalJobRunner: 12/06/03 19:22:49 INFO mapred.Task: Task:attempt_local_0003_r_000000_0 is done. And is in the process of commiting 12/06/03 19:22:49 INFO mapred.LocalJobRunner: 12/06/03 19:22:49 INFO mapred.Task: Task attempt_local_0003_r_000000_0 is allowed to commit now 12/06/03 19:22:49 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0003_r_000000_0' to /tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/partial-vectors-0 12/06/03 19:22:49 INFO mapred.LocalJobRunner: reduce > reduce 12/06/03 19:22:49 INFO mapred.Task: Task 'attempt_local_0003_r_000000_0' done. 12/06/03 19:22:50 INFO mapred.JobClient: map 100% reduce 100% 12/06/03 19:22:50 INFO mapred.JobClient: Job complete: job_local_0003 12/06/03 19:22:50 INFO mapred.JobClient: Counters: 16 12/06/03 19:22:50 INFO mapred.JobClient: File Output Format Counters 12/06/03 19:22:50 INFO mapred.JobClient: Bytes Written=16112881 12/06/03 19:22:50 INFO mapred.JobClient: FileSystemCounters 12/06/03 19:22:50 INFO mapred.JobClient: FILE_BYTES_READ=616355240 12/06/03 19:22:50 INFO mapred.JobClient: FILE_BYTES_WRITTEN=530105780 12/06/03 19:22:50 INFO mapred.JobClient: File Input Format Counters 12/06/03 19:22:50 INFO mapred.JobClient: Bytes Read=15194231 12/06/03 19:22:50 INFO mapred.JobClient: Map-Reduce Framework 12/06/03 19:22:50 INFO mapred.JobClient: Reduce input groups=21578 12/06/03 19:22:50 INFO mapred.JobClient: Map output materialized bytes=14870384 12/06/03 19:22:50 INFO mapred.JobClient: Combine output records=0 12/06/03 19:22:50 INFO mapred.JobClient: Map input records=21578 12/06/03 19:22:50 INFO mapred.JobClient: Reduce shuffle bytes=0 12/06/03 19:22:50 INFO mapred.JobClient: Reduce output records=21578 12/06/03 19:22:50 INFO mapred.JobClient: Spilled Records=43156 12/06/03 19:22:50 INFO mapred.JobClient: Map output bytes=14791487 12/06/03 19:22:50 INFO mapred.JobClient: Combine input records=0 12/06/03 19:22:50 INFO mapred.JobClient: Map output records=21578 12/06/03 19:22:50 INFO mapred.JobClient: SPLIT_RAW_BYTES=628 12/06/03 19:22:50 INFO mapred.JobClient: Reduce input records=21578 12/06/03 19:22:50 INFO common.HadoopUtil: Deleting /tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/tf-vectors 12/06/03 19:22:50 INFO input.FileInputFormat: Total input paths to process : 1 12/06/03 19:22:50 INFO mapred.JobClient: Running job: job_local_0004 12/06/03 19:22:50 INFO mapred.MapTask: io.sort.mb = 100 12/06/03 19:22:50 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/03 19:22:50 INFO mapred.MapTask: record buffer = 262144/327680 12/06/03 19:22:51 INFO mapred.MapTask: Starting flush of map output 12/06/03 19:22:51 INFO mapred.MapTask: Finished spill 0 12/06/03 19:22:51 INFO mapred.Task: Task:attempt_local_0004_m_000000_0 is done. And is in the process of commiting 12/06/03 19:22:51 INFO mapred.JobClient: map 0% reduce 0% 12/06/03 19:22:53 INFO mapred.LocalJobRunner: 12/06/03 19:22:53 INFO mapred.Task: Task 'attempt_local_0004_m_000000_0' done. 12/06/03 19:22:53 INFO mapred.LocalJobRunner: 12/06/03 19:22:53 INFO mapred.Merger: Merging 1 sorted segments 12/06/03 19:22:53 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 15774462 bytes 12/06/03 19:22:53 INFO mapred.LocalJobRunner: 12/06/03 19:22:54 INFO mapred.JobClient: map 100% reduce 0% 12/06/03 19:22:54 INFO mapred.Task: Task:attempt_local_0004_r_000000_0 is done. And is in the process of commiting 12/06/03 19:22:54 INFO mapred.LocalJobRunner: 12/06/03 19:22:54 INFO mapred.Task: Task attempt_local_0004_r_000000_0 is allowed to commit now 12/06/03 19:22:54 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0004_r_000000_0' to /tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/tf-vectors 12/06/03 19:22:56 INFO mapred.LocalJobRunner: reduce > reduce 12/06/03 19:22:56 INFO mapred.Task: Task 'attempt_local_0004_r_000000_0' done. 12/06/03 19:22:57 INFO mapred.JobClient: map 100% reduce 100% 12/06/03 19:22:57 INFO mapred.JobClient: Job complete: job_local_0004 12/06/03 19:22:57 INFO mapred.JobClient: Counters: 16 12/06/03 19:22:57 INFO mapred.JobClient: File Output Format Counters 12/06/03 19:22:57 INFO mapred.JobClient: Bytes Written=16112881 12/06/03 19:22:57 INFO mapred.JobClient: FileSystemCounters 12/06/03 19:22:57 INFO mapred.JobClient: FILE_BYTES_READ=372862490 12/06/03 19:22:57 INFO mapred.JobClient: FILE_BYTES_WRITTEN=338913223 12/06/03 19:22:57 INFO mapred.JobClient: File Input Format Counters 12/06/03 19:22:57 INFO mapred.JobClient: Bytes Read=16112881 12/06/03 19:22:57 INFO mapred.JobClient: Map-Reduce Framework 12/06/03 19:22:57 INFO mapred.JobClient: Reduce input groups=21578 12/06/03 19:22:57 INFO mapred.JobClient: Map output materialized bytes=15774466 12/06/03 19:22:57 INFO mapred.JobClient: Combine output records=0 12/06/03 19:22:57 INFO mapred.JobClient: Map input records=21578 12/06/03 19:22:57 INFO mapred.JobClient: Reduce shuffle bytes=0 12/06/03 19:22:57 INFO mapred.JobClient: Reduce output records=21578 12/06/03 19:22:57 INFO mapred.JobClient: Spilled Records=43156 12/06/03 19:22:57 INFO mapred.JobClient: Map output bytes=15691591 12/06/03 19:22:57 INFO mapred.JobClient: Combine input records=0 12/06/03 19:22:57 INFO mapred.JobClient: Map output records=21578 12/06/03 19:22:57 INFO mapred.JobClient: SPLIT_RAW_BYTES=155 12/06/03 19:22:57 INFO mapred.JobClient: Reduce input records=21578 12/06/03 19:22:57 INFO common.HadoopUtil: Deleting /tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/partial-vectors-0 12/06/03 19:22:57 INFO driver.MahoutDriver: Program took 53838 ms (Minutes: 0.8973) hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:<https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/ws/trunk/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]> SLF4J: Found binding in [jar:<https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/ws/trunk/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]> SLF4J: Found binding in [jar:<https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/ws/trunk/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 12/06/03 19:22:58 WARN driver.MahoutDriver: Unable to add class: lda java.lang.ClassNotFoundException: lda at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:169) at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:225) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:121) 12/06/03 19:22:58 WARN driver.MahoutDriver: No lda.props found on classpath, will use command-line arguments only Unknown program 'lda' chosen. Valid program names are: arff.vector: : Generate Vectors from an ARFF file or directory baumwelch: : Baum-Welch algorithm for unsupervised HMM training canopy: : Canopy clustering cat: : Print a file or resource as the logistic regression models would see it cleansvd: : Cleanup and verification of SVD output clusterdump: : Dump cluster output to text clusterpp: : Groups Clustering Output In Clusters cmdump: : Dump confusion matrix in HTML or text formats cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx) cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally. dirichlet: : Dirichlet Clustering eigencuts: : Eigencuts spectral clustering evaluateFactorization: : compute RMSE and MAE of a rating matrix factorization against probes fkmeans: : Fuzzy K-means clustering fpg: : Frequent Pattern Growth hmmpredict: : Generate random sequence of observations by given HMM itemsimilarity: : Compute the item-item-similarities for item-based collaborative filtering kmeans: : K-means clustering lucene.vector: : Generate Vectors from a Lucene index matrixdump: : Dump matrix in CSV format matrixmult: : Take the product of two matrices meanshift: : Mean Shift clustering minhash: : Run Minhash clustering parallelALS: : ALS-WR factorization of a rating matrix recommendfactorized: : Compute recommendations using the factorization of a rating matrix recommenditembased: : Compute recommendations using item-based collaborative filtering regexconverter: : Convert text files on a per line basis based on regular expressions rowid: : Map SequenceFile<Text,VectorWritable> to {SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>} rowsimilarity: : Compute the pairwise similarities of the rows of a matrix runAdaptiveLogistic: : Score new production data using a probably trained and validated AdaptivelogisticRegression model runlogistic: : Run a logistic regression model against CSV data seq2encoded: : Encoded Sparse Vector generation from Text sequence files seq2sparse: : Sparse Vector generation from Text sequence files seqdirectory: : Generate sequence files (of Text) from a directory seqdumper: : Generic Sequence File dumper seqmailarchives: : Creates SequenceFile from a directory containing gzipped mail archives seqwiki: : Wikipedia xml dump to sequence file spectralkmeans: : Spectral k-means clustering split: : Split Input data into test and train sets splitDataset: : split a rating dataset into training and probe parts ssvd: : Stochastic SVD svd: : Lanczos Singular Value Decomposition testnb: : Test the Vector-based Bayes classifier trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model trainlogistic: : Train a logistic regression using stochastic gradient descent trainnb: : Train the Vector-based Bayes classifier transpose: : Take the transpose of a matrix validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model against hold-out data set vecdist: : Compute the distances between a set of Vectors (or Cluster or Canopy, they must fit in memory) and a list of Vectors vectordump: : Dump vectors from a sequence file to text viterbi: : Viterbi decoding of hidden states from given output states sequence Build step 'Execute shell' marked build as failure
