[
https://issues.apache.org/jira/browse/MAHOUT-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13409410#comment-13409410
]
jayghost edited comment on MAHOUT-1034 at 7/9/12 1:09 PM:
----------------------------------------------------------
Hi, Leting Wu, did you solve the problem? I meet the some error as yours. I use
Hadoop1.0.1 and Mahout0.7 in Ubuntu1204 as namenode and 2 Ubuntu1010 as
datanodes.
I executed the classify-20newsgroups.sh step by step. It failed when I went to
step "./bin/mahout trainnb", the same error info as yours.
I must run in a hadoop cluster environment. I think it's the 'trainnb' command
issue. Any body help me?
{adoop@master:~$ cd program/mahout-distribution-0.7/
hadoop@master:~/program/mahout-distribution-0.7$ bin/mahout trainnb -i
~/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors -el -o
~/Downloads/20news-bydate/model -li ~/Downloads/20news-bydate/labelindex -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /home/hadoop/program/hadoop-1.0.1/bin/hadoop and
HADOOP_CONF_DIR=/home/hadoop/program/hadoop-1.0.1/conf
MAHOUT-JOB:
/home/hadoop/program/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.
12/07/09 20:43:56 WARN driver.MahoutDriver: No trainnb.props found on
classpath, will use command-line arguments only
12/07/09 20:43:57 INFO common.AbstractJob: Command line arguments:
{--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null,
--input=[/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors],
--labelIndex=[/home/hadoop/Downloads/20news-bydate/labelindex],
--output=[/home/hadoop/Downloads/20news-bydate/model], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
12/07/09 20:44:03 INFO common.HadoopUtil: Deleting temp
****/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors
12/07/09 20:44:19 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:44:21 INFO mapred.JobClient: Running job: job_201207092040_0001
12/07/09 20:44:22 INFO mapred.JobClient: map 0% reduce 0%
12/07/09 20:46:55 INFO mapred.JobClient: map 100% reduce 0%
12/07/09 20:47:36 INFO mapred.JobClient: map 100% reduce 100%
12/07/09 20:47:41 INFO mapred.JobClient: Job complete: job_201207092040_0001
12/07/09 20:47:41 INFO mapred.JobClient: Counters: 29
12/07/09 20:47:41 INFO mapred.JobClient: Job Counters
12/07/09 20:47:41 INFO mapred.JobClient: Launched reduce tasks=1
12/07/09 20:47:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=63099
12/07/09 20:47:41 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient: Launched map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient: Data-local map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=38314
12/07/09 20:47:41 INFO mapred.JobClient: File Output Format Counters
12/07/09 20:47:41 INFO mapred.JobClient: Bytes Written=97
12/07/09 20:47:41 INFO mapred.JobClient: FileSystemCounters
12/07/09 20:47:41 INFO mapred.JobClient: FILE_BYTES_READ=22
12/07/09 20:47:41 INFO mapred.JobClient: HDFS_BYTES_READ=348
12/07/09 20:47:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=45839
12/07/09 20:47:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=97
12/07/09 20:47:41 INFO mapred.JobClient: File Input Format Counters
12/07/09 20:47:41 INFO mapred.JobClient: Bytes Read=90
12/07/09 20:47:41 INFO mapred.JobClient: Map-Reduce Framework
12/07/09 20:47:41 INFO mapred.JobClient: Map output materialized bytes=14
12/07/09 20:47:41 INFO mapred.JobClient: Map input records=0
12/07/09 20:47:41 INFO mapred.JobClient: Reduce shuffle bytes=14
12/07/09 20:47:41 INFO mapred.JobClient: Spilled Records=0
12/07/09 20:47:41 INFO mapred.JobClient: Map output bytes=0
12/07/09 20:47:41 INFO mapred.JobClient: CPU time spent (ms)=7210
12/07/09 20:47:41 INFO mapred.JobClient: Total committed heap usage
(bytes)=207880192
12/07/09 20:47:41 INFO mapred.JobClient: Combine input records=0
12/07/09 20:47:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=173
12/07/09 20:47:41 INFO mapred.JobClient: Reduce input records=0
12/07/09 20:47:41 INFO mapred.JobClient: Reduce input groups=0
12/07/09 20:47:41 INFO mapred.JobClient: Combine output records=0
12/07/09 20:47:41 INFO mapred.JobClient: Physical memory (bytes)
snapshot=178298880
12/07/09 20:47:41 INFO mapred.JobClient: Reduce output records=0
12/07/09 20:47:41 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=752775168
12/07/09 20:47:41 INFO mapred.JobClient: Map output records=0
****temp/summedObservations
12/07/09 20:47:51 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:47:52 INFO mapred.JobClient: Running job: job_201207092040_0002
12/07/09 20:47:53 INFO mapred.JobClient: map 0% reduce 0%
12/07/09 20:49:14 INFO mapred.JobClient: Task Id :
attempt_201207092040_0002_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
12/07/09 20:50:19 INFO mapred.JobClient: Task Id :
attempt_201207092040_0002_m_000000_1, Status : FAILED
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
12/07/09 20:50:37 INFO mapred.JobClient: Task Id :
attempt_201207092040_0002_m_000000_2, Status : FAILED
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201207092040_0002_m_000000_2: log4j:WARN No appenders could be found
for logger (org.apache.hadoop.mapred.Task).
attempt_201207092040_0002_m_000000_2: log4j:WARN Please initialize the log4j
system properly.
12/07/09 20:51:09 INFO mapred.JobClient: Job complete: job_201207092040_0002
12/07/09 20:51:09 INFO mapred.JobClient: Counters: 7
12/07/09 20:51:09 INFO mapred.JobClient: Job Counters
12/07/09 20:51:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=119359
12/07/09 20:51:09 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient: Launched map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient: Data-local map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/07/09 20:51:09 INFO mapred.JobClient: Failed map tasks=1
12/07/09 20:51:09 INFO driver.MahoutDriver: Program took 433140 ms (Minutes:
7.219)}
Thanks!
was (Author: jayghost):
Hi, Leting Wu, did you solve the problem? I meet the some error as yours. I
use Hadoop1.0.1 and Mahout0.7 in Ubuntu1204 as namenode and 2 Ubuntu1010 as
datanodes.
I executed the classify-20newsgroups.sh step by step. It failed when I go to
step "./bin/mahout trainnb", the same error info as yours.
I must run in a hadoop cluster environment. I think it's the 'trainnb' command
issue. Any body help me?
{adoop@master:~$ cd program/mahout-distribution-0.7/
hadoop@master:~/program/mahout-distribution-0.7$ bin/mahout trainnb -i
~/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors -el -o
~/Downloads/20news-bydate/model -li ~/Downloads/20news-bydate/labelindex -ow
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Warning: $HADOOP_HOME is deprecated.
Running on hadoop, using /home/hadoop/program/hadoop-1.0.1/bin/hadoop and
HADOOP_CONF_DIR=/home/hadoop/program/hadoop-1.0.1/conf
MAHOUT-JOB:
/home/hadoop/program/mahout-distribution-0.7/mahout-examples-0.7-job.jar
Warning: $HADOOP_HOME is deprecated.
12/07/09 20:43:56 WARN driver.MahoutDriver: No trainnb.props found on
classpath, will use command-line arguments only
12/07/09 20:43:57 INFO common.AbstractJob: Command line arguments:
{--alphaI=[1.0], --endPhase=[2147483647], --extractLabels=null,
--input=[/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors],
--labelIndex=[/home/hadoop/Downloads/20news-bydate/labelindex],
--output=[/home/hadoop/Downloads/20news-bydate/model], --overwrite=null,
--startPhase=[0], --tempDir=[temp]}
12/07/09 20:44:03 INFO common.HadoopUtil: Deleting temp
****/home/hadoop/Downloads/20news-bydate/20news-bydate-test-vectors/tfidf-vectors
12/07/09 20:44:19 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:44:21 INFO mapred.JobClient: Running job: job_201207092040_0001
12/07/09 20:44:22 INFO mapred.JobClient: map 0% reduce 0%
12/07/09 20:46:55 INFO mapred.JobClient: map 100% reduce 0%
12/07/09 20:47:36 INFO mapred.JobClient: map 100% reduce 100%
12/07/09 20:47:41 INFO mapred.JobClient: Job complete: job_201207092040_0001
12/07/09 20:47:41 INFO mapred.JobClient: Counters: 29
12/07/09 20:47:41 INFO mapred.JobClient: Job Counters
12/07/09 20:47:41 INFO mapred.JobClient: Launched reduce tasks=1
12/07/09 20:47:41 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=63099
12/07/09 20:47:41 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/07/09 20:47:41 INFO mapred.JobClient: Launched map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient: Data-local map tasks=1
12/07/09 20:47:41 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=38314
12/07/09 20:47:41 INFO mapred.JobClient: File Output Format Counters
12/07/09 20:47:41 INFO mapred.JobClient: Bytes Written=97
12/07/09 20:47:41 INFO mapred.JobClient: FileSystemCounters
12/07/09 20:47:41 INFO mapred.JobClient: FILE_BYTES_READ=22
12/07/09 20:47:41 INFO mapred.JobClient: HDFS_BYTES_READ=348
12/07/09 20:47:41 INFO mapred.JobClient: FILE_BYTES_WRITTEN=45839
12/07/09 20:47:41 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=97
12/07/09 20:47:41 INFO mapred.JobClient: File Input Format Counters
12/07/09 20:47:41 INFO mapred.JobClient: Bytes Read=90
12/07/09 20:47:41 INFO mapred.JobClient: Map-Reduce Framework
12/07/09 20:47:41 INFO mapred.JobClient: Map output materialized bytes=14
12/07/09 20:47:41 INFO mapred.JobClient: Map input records=0
12/07/09 20:47:41 INFO mapred.JobClient: Reduce shuffle bytes=14
12/07/09 20:47:41 INFO mapred.JobClient: Spilled Records=0
12/07/09 20:47:41 INFO mapred.JobClient: Map output bytes=0
12/07/09 20:47:41 INFO mapred.JobClient: CPU time spent (ms)=7210
12/07/09 20:47:41 INFO mapred.JobClient: Total committed heap usage
(bytes)=207880192
12/07/09 20:47:41 INFO mapred.JobClient: Combine input records=0
12/07/09 20:47:41 INFO mapred.JobClient: SPLIT_RAW_BYTES=173
12/07/09 20:47:41 INFO mapred.JobClient: Reduce input records=0
12/07/09 20:47:41 INFO mapred.JobClient: Reduce input groups=0
12/07/09 20:47:41 INFO mapred.JobClient: Combine output records=0
12/07/09 20:47:41 INFO mapred.JobClient: Physical memory (bytes)
snapshot=178298880
12/07/09 20:47:41 INFO mapred.JobClient: Reduce output records=0
12/07/09 20:47:41 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=752775168
12/07/09 20:47:41 INFO mapred.JobClient: Map output records=0
****temp/summedObservations
12/07/09 20:47:51 INFO input.FileInputFormat: Total input paths to process : 1
12/07/09 20:47:52 INFO mapred.JobClient: Running job: job_201207092040_0002
12/07/09 20:47:53 INFO mapred.JobClient: map 0% reduce 0%
12/07/09 20:49:14 INFO mapred.JobClient: Task Id :
attempt_201207092040_0002_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
12/07/09 20:50:19 INFO mapred.JobClient: Task Id :
attempt_201207092040_0002_m_000000_1, Status : FAILED
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
12/07/09 20:50:37 INFO mapred.JobClient: Task Id :
attempt_201207092040_0002_m_000000_2, Status : FAILED
java.lang.IllegalArgumentException
at
com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
attempt_201207092040_0002_m_000000_2: log4j:WARN No appenders could be found
for logger (org.apache.hadoop.mapred.Task).
attempt_201207092040_0002_m_000000_2: log4j:WARN Please initialize the log4j
system properly.
12/07/09 20:51:09 INFO mapred.JobClient: Job complete: job_201207092040_0002
12/07/09 20:51:09 INFO mapred.JobClient: Counters: 7
12/07/09 20:51:09 INFO mapred.JobClient: Job Counters
12/07/09 20:51:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=119359
12/07/09 20:51:09 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/07/09 20:51:09 INFO mapred.JobClient: Launched map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient: Data-local map tasks=4
12/07/09 20:51:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
12/07/09 20:51:09 INFO mapred.JobClient: Failed map tasks=1
12/07/09 20:51:09 INFO driver.MahoutDriver: Program took 433140 ms (Minutes:
7.219)}
Thanks!
> ERROR in Navie Bayes Training(trainnb)
> --------------------------------------
>
> Key: MAHOUT-1034
> URL: https://issues.apache.org/jira/browse/MAHOUT-1034
> Project: Mahout
> Issue Type: Bug
> Components: Classification
> Affects Versions: 0.7
> Environment: Ubuntu 11.04
> Reporter: Leting Wu
> Priority: Critical
>
> When run either examples/classify-20newsgrouops.sh or ash-email-examples.sh,
> trainnb always fails:
> {noformat}
> INFO mapred.JobClient: Task Id : attempt_201206281546_0003_m_000000_0, Status
> : FAILED
> java.lang.IllegalArgumentException
> at
> com.google.common.base.Preconditions.checkArgument(Preconditions.java:72)
> at
> org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:42)
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1177)
> at org.apache.hadoop.mapred.Child.main(Child.java:264)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira