[jira] [Comment Edited] (MAHOUT-1034) ERROR in Navie Bayes Training(trainnb)

jayghost (JIRA) Mon, 09 Jul 2012 06:10:39 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409410#comment-13409410
 ]


jayghost edited comment on MAHOUT-1034 at 7/9/12 1:09 PM:
----------------------------------------------------------

Hi, Leting Wu, did you solve the problem? I meet the some error as yours. I use 
Hadoop1.0.1 and Mahout0.7 in Ubuntu1204 as namenode and 2 Ubuntu1010 as 
datanodes.
I executed the classify-20newsgroups.sh step by step. It failed when I went to 
step "./bin/mahout trainnb", the same error info as yours.
I must run in a hadoop cluster environment. I think it's the 'trainnb' command 
issue. Any body help me?
{adoop@master:~$ cd program/mahout-distribution-0.7/
hadoop@master:~/program/mahout-distribution-0.7$ bin/mahout trainnb -i 
~/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors -el -o 
~/Downloads/20news-bydate/model -li ~/Downloads/20news-bydate/labelindex -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/hadoop/program/hadoop-1.0.1/bin/hadoop and 
HADOOP_CONF_DIR=/home/hadoop/program/hadoop-1.0.1/conf
MAHOUT-JOB: 
/home/hadoop/program/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

12/07/09 20:43:56 WARN driver.MahoutDriver: No trainnb.props found on 
classpath, will use command-line arguments only
12/07/09 20:43:57 INFO common.AbstractJob: Command line arguments: 
{--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, 
--input=[/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors],
 --labelIndex=[/home/hadoop/Downloads/20news-bydate/labelindex], 
--output=[/home/hadoop/Downloads/20news-bydate/model], --overwrite=null, 
--startPhase=[0], --tempDir=[temp]}
12/07/09 20:44:03 INFO common.HadoopUtil: Deleting temp
****/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors
12/07/09 20:44:19 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:44:21 INFO mapred.JobClient: Running job: job_201207092040_0001
12/07/09 20:44:22 INFO mapred.JobClient:  map 0% reduce 0%
12/07/09 20:46:55 INFO mapred.JobClient:  map 100% reduce 0%
12/07/09 20:47:36 INFO mapred.JobClient:  map 100% reduce 100%
12/07/09 20:47:41 INFO mapred.JobClient: Job complete: job_201207092040_0001
12/07/09 20:47:41 INFO mapred.JobClient: Counters: 29
12/07/09 20:47:41 INFO mapred.JobClient:   Job Counters 
12/07/09 20:47:41 INFO mapred.JobClient:     Launched reduce tasks=1
12/07/09 20:47:41 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=63099
12/07/09 20:47:41 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient:     Launched map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient:     Data-local map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=38314
12/07/09 20:47:41 INFO mapred.JobClient:   File Output Format Counters 
12/07/09 20:47:41 INFO mapred.JobClient:     Bytes Written=97
12/07/09 20:47:41 INFO mapred.JobClient:   FileSystemCounters
12/07/09 20:47:41 INFO mapred.JobClient:     FILE_BYTES_READ=22
12/07/09 20:47:41 INFO mapred.JobClient:     HDFS_BYTES_READ=348
12/07/09 20:47:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45839
12/07/09 20:47:41 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=97
12/07/09 20:47:41 INFO mapred.JobClient:   File Input Format Counters 
12/07/09 20:47:41 INFO mapred.JobClient:     Bytes Read=90
12/07/09 20:47:41 INFO mapred.JobClient:   Map-Reduce Framework
12/07/09 20:47:41 INFO mapred.JobClient:     Map output materialized bytes=14
12/07/09 20:47:41 INFO mapred.JobClient:     Map input records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce shuffle bytes=14
12/07/09 20:47:41 INFO mapred.JobClient:     Spilled Records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Map output bytes=0
12/07/09 20:47:41 INFO mapred.JobClient:     CPU time spent (ms)=7210
12/07/09 20:47:41 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=207880192
12/07/09 20:47:41 INFO mapred.JobClient:     Combine input records=0
12/07/09 20:47:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=173
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce input records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce input groups=0
12/07/09 20:47:41 INFO mapred.JobClient:     Combine output records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=178298880
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce output records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=752775168
12/07/09 20:47:41 INFO mapred.JobClient:     Map output records=0
****temp/summedObservations
12/07/09 20:47:51 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:47:52 INFO mapred.JobClient: Running job: job_201207092040_0002
12/07/09 20:47:53 INFO mapred.JobClient:  map 0% reduce 0%
12/07/09 20:49:14 INFO mapred.JobClient: Task Id : 
attempt_201207092040_0002_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at 
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

12/07/09 20:50:19 INFO mapred.JobClient: Task Id : 
attempt_201207092040_0002_m_000000_1, Status : FAILED
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at 
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

12/07/09 20:50:37 INFO mapred.JobClient: Task Id : 
attempt_201207092040_0002_m_000000_2, Status : FAILED
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at 
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201207092040_0002_m_000000_2: log4j:WARN No appenders could be found 
for logger (org.apache.hadoop.mapred.Task).
attempt_201207092040_0002_m_000000_2: log4j:WARN Please initialize the log4j 
system properly.
12/07/09 20:51:09 INFO mapred.JobClient: Job complete: job_201207092040_0002
12/07/09 20:51:09 INFO mapred.JobClient: Counters: 7
12/07/09 20:51:09 INFO mapred.JobClient:   Job Counters 
12/07/09 20:51:09 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=119359
12/07/09 20:51:09 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient:     Launched map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient:     Data-local map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/07/09 20:51:09 INFO mapred.JobClient:     Failed map tasks=1
12/07/09 20:51:09 INFO driver.MahoutDriver: Program took 433140 ms (Minutes: 
7.219)}
Thanks!
                
      was (Author: jayghost):
    Hi, Leting Wu, did you solve the problem? I meet the some error as yours. I 
use Hadoop1.0.1 and Mahout0.7 in Ubuntu1204 as namenode and 2 Ubuntu1010 as 
datanodes.
I executed the classify-20newsgroups.sh step by step. It failed when I go to 
step "./bin/mahout trainnb", the same error info as yours.
I must run in a hadoop cluster environment. I think it's the 'trainnb' command 
issue. Any body help me?
{adoop@master:~$ cd program/mahout-distribution-0.7/
hadoop@master:~/program/mahout-distribution-0.7$ bin/mahout trainnb -i 
~/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors -el -o 
~/Downloads/20news-bydate/model -li ~/Downloads/20news-bydate/labelindex -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.

Running on hadoop, using /home/hadoop/program/hadoop-1.0.1/bin/hadoop and 
HADOOP_CONF_DIR=/home/hadoop/program/hadoop-1.0.1/conf
MAHOUT-JOB: 
/home/hadoop/program/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.

12/07/09 20:43:56 WARN driver.MahoutDriver: No trainnb.props found on 
classpath, will use command-line arguments only
12/07/09 20:43:57 INFO common.AbstractJob: Command line arguments: 
{--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null, 
--input=[/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors],
 --labelIndex=[/home/hadoop/Downloads/20news-bydate/labelindex], 
--output=[/home/hadoop/Downloads/20news-bydate/model], --overwrite=null, 
--startPhase=[0], --tempDir=[temp]}
12/07/09 20:44:03 INFO common.HadoopUtil: Deleting temp
****/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors
12/07/09 20:44:19 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:44:21 INFO mapred.JobClient: Running job: job_201207092040_0001
12/07/09 20:44:22 INFO mapred.JobClient:  map 0% reduce 0%
12/07/09 20:46:55 INFO mapred.JobClient:  map 100% reduce 0%
12/07/09 20:47:36 INFO mapred.JobClient:  map 100% reduce 100%
12/07/09 20:47:41 INFO mapred.JobClient: Job complete: job_201207092040_0001
12/07/09 20:47:41 INFO mapred.JobClient: Counters: 29
12/07/09 20:47:41 INFO mapred.JobClient:   Job Counters 
12/07/09 20:47:41 INFO mapred.JobClient:     Launched reduce tasks=1
12/07/09 20:47:41 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=63099
12/07/09 20:47:41 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient:     Launched map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient:     Data-local map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=38314
12/07/09 20:47:41 INFO mapred.JobClient:   File Output Format Counters 
12/07/09 20:47:41 INFO mapred.JobClient:     Bytes Written=97
12/07/09 20:47:41 INFO mapred.JobClient:   FileSystemCounters
12/07/09 20:47:41 INFO mapred.JobClient:     FILE_BYTES_READ=22
12/07/09 20:47:41 INFO mapred.JobClient:     HDFS_BYTES_READ=348
12/07/09 20:47:41 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=45839
12/07/09 20:47:41 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=97
12/07/09 20:47:41 INFO mapred.JobClient:   File Input Format Counters 
12/07/09 20:47:41 INFO mapred.JobClient:     Bytes Read=90
12/07/09 20:47:41 INFO mapred.JobClient:   Map-Reduce Framework
12/07/09 20:47:41 INFO mapred.JobClient:     Map output materialized bytes=14
12/07/09 20:47:41 INFO mapred.JobClient:     Map input records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce shuffle bytes=14
12/07/09 20:47:41 INFO mapred.JobClient:     Spilled Records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Map output bytes=0
12/07/09 20:47:41 INFO mapred.JobClient:     CPU time spent (ms)=7210
12/07/09 20:47:41 INFO mapred.JobClient:     Total committed heap usage 
(bytes)=207880192
12/07/09 20:47:41 INFO mapred.JobClient:     Combine input records=0
12/07/09 20:47:41 INFO mapred.JobClient:     SPLIT_RAW_BYTES=173
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce input records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce input groups=0
12/07/09 20:47:41 INFO mapred.JobClient:     Combine output records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Physical memory (bytes) 
snapshot=178298880
12/07/09 20:47:41 INFO mapred.JobClient:     Reduce output records=0
12/07/09 20:47:41 INFO mapred.JobClient:     Virtual memory (bytes) 
snapshot=752775168
12/07/09 20:47:41 INFO mapred.JobClient:     Map output records=0
****temp/summedObservations
12/07/09 20:47:51 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:47:52 INFO mapred.JobClient: Running job: job_201207092040_0002
12/07/09 20:47:53 INFO mapred.JobClient:  map 0% reduce 0%
12/07/09 20:49:14 INFO mapred.JobClient: Task Id : 
attempt_201207092040_0002_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at 
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

12/07/09 20:50:19 INFO mapred.JobClient: Task Id : 
attempt_201207092040_0002_m_000000_1, Status : FAILED
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at 
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

12/07/09 20:50:37 INFO mapred.JobClient: Task Id : 
attempt_201207092040_0002_m_000000_2, Status : FAILED
java.lang.IllegalArgumentException
        at 
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
        at 
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

attempt_201207092040_0002_m_000000_2: log4j:WARN No appenders could be found 
for logger (org.apache.hadoop.mapred.Task).
attempt_201207092040_0002_m_000000_2: log4j:WARN Please initialize the log4j 
system properly.
12/07/09 20:51:09 INFO mapred.JobClient: Job complete: job_201207092040_0002
12/07/09 20:51:09 INFO mapred.JobClient: Counters: 7
12/07/09 20:51:09 INFO mapred.JobClient:   Job Counters 
12/07/09 20:51:09 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=119359
12/07/09 20:51:09 INFO mapred.JobClient:     Total time spent by all reduces 
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient:     Total time spent by all maps 
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient:     Launched map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient:     Data-local map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
12/07/09 20:51:09 INFO mapred.JobClient:     Failed map tasks=1
12/07/09 20:51:09 INFO driver.MahoutDriver: Program took 433140 ms (Minutes: 
7.219)}
Thanks!
                  
> ERROR in Navie Bayes Training(trainnb)
> --------------------------------------
>
>                 Key: MAHOUT-1034
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1034
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.7
>         Environment: Ubuntu 11.04
>            Reporter: Leting Wu
>            Priority: Critical
>
> When run either examples/classify-20newsgrouops.sh or ash-email-examples.sh, 
> trainnb always fails:
> {noformat}
> INFO mapred.JobClient: Task Id : attempt_201206281546_0003_m_000000_0, Status 
> : FAILED
> java.lang.IllegalArgumentException
>       at 
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
>       at 
> org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
>       at org.apache.hadoop.mapred.Child.main(Child.java:264)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (MAHOUT-1034) ERROR in Navie Bayes Training(trainnb)

Reply via email to