See <https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/152/changes>

Changes:

[ssc] MAHOUT-986 Remove old LDA implementation from codebase

[robinanil] MAHOUT-1006 making end to end example work

[gsingers] MAHOUT-1023: make sure getConf gets a real conf

------------------------------------------
[...truncated 7280 lines...]
12/06/03 19:22:24 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 
327680
12/06/03 19:22:24 INFO mapred.MapTask: Finished spill 0
12/06/03 19:22:25 INFO mapred.MapTask: Starting flush of map output
12/06/03 19:22:25 INFO mapred.MapTask: Finished spill 1
12/06/03 19:22:25 INFO mapred.Merger: Merging 2 sorted segments
12/06/03 19:22:25 INFO mapred.Merger: Down to the last merge-pass, with 2 
segments left of total size: 880434 bytes
12/06/03 19:22:25 INFO mapred.Task: Task:attempt_local_0002_m_000002_0 is done. 
And is in the process of commiting
12/06/03 19:22:27 INFO mapred.LocalJobRunner: 
12/06/03 19:22:27 INFO mapred.Task: Task 'attempt_local_0002_m_000002_0' done.
12/06/03 19:22:27 INFO mapred.MapTask: io.sort.mb = 100
12/06/03 19:22:27 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/03 19:22:27 INFO mapred.MapTask: record buffer = 262144/327680
12/06/03 19:22:27 INFO mapred.MapTask: Spilling map output: record full = true
12/06/03 19:22:27 INFO mapred.MapTask: bufstart = 0; bufend = 3828741; bufvoid 
= 99614720
12/06/03 19:22:27 INFO mapred.MapTask: kvstart = 0; kvend = 262144; length = 
327680
12/06/03 19:22:27 INFO mapred.MapTask: Starting flush of map output
12/06/03 19:22:27 INFO mapred.MapTask: Finished spill 0
12/06/03 19:22:28 INFO mapred.MapTask: Finished spill 1
12/06/03 19:22:28 INFO mapred.Merger: Merging 2 sorted segments
12/06/03 19:22:28 INFO mapred.Merger: Down to the last merge-pass, with 2 
segments left of total size: 724150 bytes
12/06/03 19:22:28 INFO mapred.Task: Task:attempt_local_0002_m_000003_0 is done. 
And is in the process of commiting
12/06/03 19:22:30 INFO mapred.LocalJobRunner: 
12/06/03 19:22:30 INFO mapred.Task: Task 'attempt_local_0002_m_000003_0' done.
12/06/03 19:22:30 INFO mapred.LocalJobRunner: 
12/06/03 19:22:30 INFO mapred.Merger: Merging 4 sorted segments
12/06/03 19:22:30 INFO mapred.Merger: Down to the last merge-pass, with 4 
segments left of total size: 3360542 bytes
12/06/03 19:22:30 INFO mapred.LocalJobRunner: 
12/06/03 19:22:30 INFO mapred.Task: Task:attempt_local_0002_r_000000_0 is done. 
And is in the process of commiting
12/06/03 19:22:30 INFO mapred.LocalJobRunner: 
12/06/03 19:22:30 INFO mapred.Task: Task attempt_local_0002_r_000000_0 is 
allowed to commit now
12/06/03 19:22:30 INFO output.FileOutputCommitter: Saved output of task 
'attempt_local_0002_r_000000_0' to 
/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/wordcount
12/06/03 19:22:33 INFO mapred.LocalJobRunner: reduce > reduce
12/06/03 19:22:33 INFO mapred.Task: Task 'attempt_local_0002_r_000000_0' done.
12/06/03 19:22:34 INFO mapred.JobClient:  map 100% reduce 100%
12/06/03 19:22:34 INFO mapred.JobClient: Job complete: job_local_0002
12/06/03 19:22:34 INFO mapred.JobClient: Counters: 16
12/06/03 19:22:34 INFO mapred.JobClient:   File Output Format Counters 
12/06/03 19:22:34 INFO mapred.JobClient:     Bytes Written=1000743
12/06/03 19:22:34 INFO mapred.JobClient:   FileSystemCounters
12/06/03 19:22:34 INFO mapred.JobClient:     FILE_BYTES_READ=385720347
12/06/03 19:22:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=326116531
12/06/03 19:22:34 INFO mapred.JobClient:   File Input Format Counters 
12/06/03 19:22:34 INFO mapred.JobClient:     Bytes Read=15194231
12/06/03 19:22:34 INFO mapred.JobClient:   Map-Reduce Framework
12/06/03 19:22:34 INFO mapred.JobClient:     Reduce input groups=73413
12/06/03 19:22:34 INFO mapred.JobClient:     Map output materialized 
bytes=3360558
12/06/03 19:22:34 INFO mapred.JobClient:     Combine output records=190745
12/06/03 19:22:34 INFO mapred.JobClient:     Map input records=21578
12/06/03 19:22:34 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/06/03 19:22:34 INFO mapred.JobClient:     Reduce output records=41807
12/06/03 19:22:34 INFO mapred.JobClient:     Spilled Records=572235
12/06/03 19:22:34 INFO mapred.JobClient:     Map output bytes=22501851
12/06/03 19:22:34 INFO mapred.JobClient:     Combine input records=1540960
12/06/03 19:22:34 INFO mapred.JobClient:     Map output records=1540960
12/06/03 19:22:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=628
12/06/03 19:22:34 INFO mapred.JobClient:     Reduce input records=190745
12/06/03 19:22:34 INFO input.FileInputFormat: Total input paths to process : 4
12/06/03 19:22:34 INFO filecache.TrackerDistributedCacheManager: Creating 
dictionary.file-0 in 
/tmp/hadoop-jenkins/mapred/local/archive/-6328907927761479619_-1970001630_869041919/file/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda-work-5627208055286448987
 with rwxr-xr-x
12/06/03 19:22:34 INFO filecache.TrackerDistributedCacheManager: Cached 
/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0 as 
/tmp/hadoop-jenkins/mapred/local/archive/-6328907927761479619_-1970001630_869041919/file/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0
12/06/03 19:22:34 INFO filecache.TrackerDistributedCacheManager: Cached 
/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0 as 
/tmp/hadoop-jenkins/mapred/local/archive/-6328907927761479619_-1970001630_869041919/file/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/dictionary.file-0
12/06/03 19:22:34 INFO mapred.JobClient: Running job: job_local_0003
12/06/03 19:22:34 INFO mapred.MapTask: io.sort.mb = 100
12/06/03 19:22:34 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/03 19:22:34 INFO mapred.MapTask: record buffer = 262144/327680
12/06/03 19:22:35 INFO mapred.MapTask: Starting flush of map output
12/06/03 19:22:35 INFO mapred.MapTask: Finished spill 0
12/06/03 19:22:35 INFO mapred.Task: Task:attempt_local_0003_m_000000_0 is done. 
And is in the process of commiting
12/06/03 19:22:35 INFO mapred.JobClient:  map 0% reduce 0%
12/06/03 19:22:37 INFO mapred.LocalJobRunner: 
12/06/03 19:22:37 INFO mapred.Task: Task 'attempt_local_0003_m_000000_0' done.
12/06/03 19:22:37 INFO mapred.MapTask: io.sort.mb = 100
12/06/03 19:22:37 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/03 19:22:37 INFO mapred.MapTask: record buffer = 262144/327680
12/06/03 19:22:38 INFO mapred.MapTask: Starting flush of map output
12/06/03 19:22:38 INFO mapred.MapTask: Finished spill 0
12/06/03 19:22:38 INFO mapred.Task: Task:attempt_local_0003_m_000001_0 is done. 
And is in the process of commiting
12/06/03 19:22:38 INFO mapred.JobClient:  map 100% reduce 0%
12/06/03 19:22:40 INFO mapred.LocalJobRunner: 
12/06/03 19:22:40 INFO mapred.Task: Task 'attempt_local_0003_m_000001_0' done.
12/06/03 19:22:40 INFO mapred.MapTask: io.sort.mb = 100
12/06/03 19:22:40 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/03 19:22:40 INFO mapred.MapTask: record buffer = 262144/327680
12/06/03 19:22:41 INFO mapred.MapTask: Starting flush of map output
12/06/03 19:22:41 INFO mapred.MapTask: Finished spill 0
12/06/03 19:22:41 INFO mapred.Task: Task:attempt_local_0003_m_000002_0 is done. 
And is in the process of commiting
12/06/03 19:22:43 INFO mapred.LocalJobRunner: 
12/06/03 19:22:43 INFO mapred.Task: Task 'attempt_local_0003_m_000002_0' done.
12/06/03 19:22:43 INFO mapred.MapTask: io.sort.mb = 100
12/06/03 19:22:43 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/03 19:22:43 INFO mapred.MapTask: record buffer = 262144/327680
12/06/03 19:22:44 INFO mapred.MapTask: Starting flush of map output
12/06/03 19:22:44 INFO mapred.MapTask: Finished spill 0
12/06/03 19:22:44 INFO mapred.Task: Task:attempt_local_0003_m_000003_0 is done. 
And is in the process of commiting
12/06/03 19:22:46 INFO mapred.LocalJobRunner: 
12/06/03 19:22:46 INFO mapred.Task: Task 'attempt_local_0003_m_000003_0' done.
12/06/03 19:22:46 INFO mapred.LocalJobRunner: 
12/06/03 19:22:46 INFO mapred.Merger: Merging 4 sorted segments
12/06/03 19:22:46 INFO mapred.Merger: Down to the last merge-pass, with 4 
segments left of total size: 14870368 bytes
12/06/03 19:22:46 INFO mapred.LocalJobRunner: 
12/06/03 19:22:49 INFO mapred.Task: Task:attempt_local_0003_r_000000_0 is done. 
And is in the process of commiting
12/06/03 19:22:49 INFO mapred.LocalJobRunner: 
12/06/03 19:22:49 INFO mapred.Task: Task attempt_local_0003_r_000000_0 is 
allowed to commit now
12/06/03 19:22:49 INFO output.FileOutputCommitter: Saved output of task 
'attempt_local_0003_r_000000_0' to 
/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/partial-vectors-0
12/06/03 19:22:49 INFO mapred.LocalJobRunner: reduce > reduce
12/06/03 19:22:49 INFO mapred.Task: Task 'attempt_local_0003_r_000000_0' done.
12/06/03 19:22:50 INFO mapred.JobClient:  map 100% reduce 100%
12/06/03 19:22:50 INFO mapred.JobClient: Job complete: job_local_0003
12/06/03 19:22:50 INFO mapred.JobClient: Counters: 16
12/06/03 19:22:50 INFO mapred.JobClient:   File Output Format Counters 
12/06/03 19:22:50 INFO mapred.JobClient:     Bytes Written=16112881
12/06/03 19:22:50 INFO mapred.JobClient:   FileSystemCounters
12/06/03 19:22:50 INFO mapred.JobClient:     FILE_BYTES_READ=616355240
12/06/03 19:22:50 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=530105780
12/06/03 19:22:50 INFO mapred.JobClient:   File Input Format Counters 
12/06/03 19:22:50 INFO mapred.JobClient:     Bytes Read=15194231
12/06/03 19:22:50 INFO mapred.JobClient:   Map-Reduce Framework
12/06/03 19:22:50 INFO mapred.JobClient:     Reduce input groups=21578
12/06/03 19:22:50 INFO mapred.JobClient:     Map output materialized 
bytes=14870384
12/06/03 19:22:50 INFO mapred.JobClient:     Combine output records=0
12/06/03 19:22:50 INFO mapred.JobClient:     Map input records=21578
12/06/03 19:22:50 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/06/03 19:22:50 INFO mapred.JobClient:     Reduce output records=21578
12/06/03 19:22:50 INFO mapred.JobClient:     Spilled Records=43156
12/06/03 19:22:50 INFO mapred.JobClient:     Map output bytes=14791487
12/06/03 19:22:50 INFO mapred.JobClient:     Combine input records=0
12/06/03 19:22:50 INFO mapred.JobClient:     Map output records=21578
12/06/03 19:22:50 INFO mapred.JobClient:     SPLIT_RAW_BYTES=628
12/06/03 19:22:50 INFO mapred.JobClient:     Reduce input records=21578
12/06/03 19:22:50 INFO common.HadoopUtil: Deleting 
/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/tf-vectors
12/06/03 19:22:50 INFO input.FileInputFormat: Total input paths to process : 1
12/06/03 19:22:50 INFO mapred.JobClient: Running job: job_local_0004
12/06/03 19:22:50 INFO mapred.MapTask: io.sort.mb = 100
12/06/03 19:22:50 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/03 19:22:50 INFO mapred.MapTask: record buffer = 262144/327680
12/06/03 19:22:51 INFO mapred.MapTask: Starting flush of map output
12/06/03 19:22:51 INFO mapred.MapTask: Finished spill 0
12/06/03 19:22:51 INFO mapred.Task: Task:attempt_local_0004_m_000000_0 is done. 
And is in the process of commiting
12/06/03 19:22:51 INFO mapred.JobClient:  map 0% reduce 0%
12/06/03 19:22:53 INFO mapred.LocalJobRunner: 
12/06/03 19:22:53 INFO mapred.Task: Task 'attempt_local_0004_m_000000_0' done.
12/06/03 19:22:53 INFO mapred.LocalJobRunner: 
12/06/03 19:22:53 INFO mapred.Merger: Merging 1 sorted segments
12/06/03 19:22:53 INFO mapred.Merger: Down to the last merge-pass, with 1 
segments left of total size: 15774462 bytes
12/06/03 19:22:53 INFO mapred.LocalJobRunner: 
12/06/03 19:22:54 INFO mapred.JobClient:  map 100% reduce 0%
12/06/03 19:22:54 INFO mapred.Task: Task:attempt_local_0004_r_000000_0 is done. 
And is in the process of commiting
12/06/03 19:22:54 INFO mapred.LocalJobRunner: 
12/06/03 19:22:54 INFO mapred.Task: Task attempt_local_0004_r_000000_0 is 
allowed to commit now
12/06/03 19:22:54 INFO output.FileOutputCommitter: Saved output of task 
'attempt_local_0004_r_000000_0' to 
/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/tf-vectors
12/06/03 19:22:56 INFO mapred.LocalJobRunner: reduce > reduce
12/06/03 19:22:56 INFO mapred.Task: Task 'attempt_local_0004_r_000000_0' done.
12/06/03 19:22:57 INFO mapred.JobClient:  map 100% reduce 100%
12/06/03 19:22:57 INFO mapred.JobClient: Job complete: job_local_0004
12/06/03 19:22:57 INFO mapred.JobClient: Counters: 16
12/06/03 19:22:57 INFO mapred.JobClient:   File Output Format Counters 
12/06/03 19:22:57 INFO mapred.JobClient:     Bytes Written=16112881
12/06/03 19:22:57 INFO mapred.JobClient:   FileSystemCounters
12/06/03 19:22:57 INFO mapred.JobClient:     FILE_BYTES_READ=372862490
12/06/03 19:22:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=338913223
12/06/03 19:22:57 INFO mapred.JobClient:   File Input Format Counters 
12/06/03 19:22:57 INFO mapred.JobClient:     Bytes Read=16112881
12/06/03 19:22:57 INFO mapred.JobClient:   Map-Reduce Framework
12/06/03 19:22:57 INFO mapred.JobClient:     Reduce input groups=21578
12/06/03 19:22:57 INFO mapred.JobClient:     Map output materialized 
bytes=15774466
12/06/03 19:22:57 INFO mapred.JobClient:     Combine output records=0
12/06/03 19:22:57 INFO mapred.JobClient:     Map input records=21578
12/06/03 19:22:57 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/06/03 19:22:57 INFO mapred.JobClient:     Reduce output records=21578
12/06/03 19:22:57 INFO mapred.JobClient:     Spilled Records=43156
12/06/03 19:22:57 INFO mapred.JobClient:     Map output bytes=15691591
12/06/03 19:22:57 INFO mapred.JobClient:     Combine input records=0
12/06/03 19:22:57 INFO mapred.JobClient:     Map output records=21578
12/06/03 19:22:57 INFO mapred.JobClient:     SPLIT_RAW_BYTES=155
12/06/03 19:22:57 INFO mapred.JobClient:     Reduce input records=21578
12/06/03 19:22:57 INFO common.HadoopUtil: Deleting 
/tmp/mahout-work-jenkins/reuters-out-seqdir-sparse-lda/partial-vectors-0
12/06/03 19:22:57 INFO driver.MahoutDriver: Program took 53838 ms (Minutes: 
0.8973)
hadoop binary is not in PATH,HADOOP_HOME/bin,HADOOP_PREFIX/bin, running locally
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:<https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/ws/trunk/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar!/org/slf4j/impl/StaticLoggerBinder.class]>
SLF4J: Found binding in 
[jar:<https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/ws/trunk/examples/target/dependency/slf4j-jcl-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]>
SLF4J: Found binding in 
[jar:<https://builds.apache.org/job/Mahout-Examples-Cluster-Reuters/ws/trunk/examples/target/dependency/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]>
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
12/06/03 19:22:58 WARN driver.MahoutDriver: Unable to add class: lda
java.lang.ClassNotFoundException: lda
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:169)
        at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:225)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:121)
12/06/03 19:22:58 WARN driver.MahoutDriver: No lda.props found on classpath, 
will use command-line arguments only
Unknown program 'lda' chosen.
Valid program names are:
  arff.vector: : Generate Vectors from an ARFF file or directory
  baumwelch: : Baum-Welch algorithm for unsupervised HMM training
  canopy: : Canopy clustering
  cat: : Print a file or resource as the logistic regression models would see it
  cleansvd: : Cleanup and verification of SVD output
  clusterdump: : Dump cluster output to text
  clusterpp: : Groups Clustering Output In Clusters
  cmdump: : Dump confusion matrix in HTML or text formats
  cvb: : LDA via Collapsed Variation Bayes (0th deriv. approx)
  cvb0_local: : LDA via Collapsed Variation Bayes, in memory locally.
  dirichlet: : Dirichlet Clustering
  eigencuts: : Eigencuts spectral clustering
  evaluateFactorization: : compute RMSE and MAE of a rating matrix 
factorization against probes
  fkmeans: : Fuzzy K-means clustering
  fpg: : Frequent Pattern Growth
  hmmpredict: : Generate random sequence of observations by given HMM
  itemsimilarity: : Compute the item-item-similarities for item-based 
collaborative filtering
  kmeans: : K-means clustering
  lucene.vector: : Generate Vectors from a Lucene index
  matrixdump: : Dump matrix in CSV format
  matrixmult: : Take the product of two matrices
  meanshift: : Mean Shift clustering
  minhash: : Run Minhash clustering
  parallelALS: : ALS-WR factorization of a rating matrix
  recommendfactorized: : Compute recommendations using the factorization of a 
rating matrix
  recommenditembased: : Compute recommendations using item-based collaborative 
filtering
  regexconverter: : Convert text files on a per line basis based on regular 
expressions
  rowid: : Map SequenceFile<Text,VectorWritable> to 
{SequenceFile<IntWritable,VectorWritable>, SequenceFile<IntWritable,Text>}
  rowsimilarity: : Compute the pairwise similarities of the rows of a matrix
  runAdaptiveLogistic: : Score new production data using a probably trained and 
validated AdaptivelogisticRegression model
  runlogistic: : Run a logistic regression model against CSV data
  seq2encoded: : Encoded Sparse Vector generation from Text sequence files
  seq2sparse: : Sparse Vector generation from Text sequence files
  seqdirectory: : Generate sequence files (of Text) from a directory
  seqdumper: : Generic Sequence File dumper
  seqmailarchives: : Creates SequenceFile from a directory containing gzipped 
mail archives
  seqwiki: : Wikipedia xml dump to sequence file
  spectralkmeans: : Spectral k-means clustering
  split: : Split Input data into test and train sets
  splitDataset: : split a rating dataset into training and probe parts
  ssvd: : Stochastic SVD
  svd: : Lanczos Singular Value Decomposition
  testnb: : Test the Vector-based Bayes classifier
  trainAdaptiveLogistic: : Train an AdaptivelogisticRegression model
  trainlogistic: : Train a logistic regression using stochastic gradient descent
  trainnb: : Train the Vector-based Bayes classifier
  transpose: : Take the transpose of a matrix
  validateAdaptiveLogistic: : Validate an AdaptivelogisticRegression model 
against hold-out data set
  vecdist: : Compute the distances between a set of Vectors (or Cluster or 
Canopy, they must fit in memory) and a list of Vectors
  vectordump: : Dump vectors from a sequence file to text
  viterbi: : Viterbi decoding of hidden states from given output states sequence
Build step 'Execute shell' marked build as failure

Reply via email to