Ha Son Hai created MAHOUT-1658:
----------------------------------

             Summary: Kmeans fails when running on HDFS
                 Key: MAHOUT-1658
                 URL: https://issues.apache.org/jira/browse/MAHOUT-1658
             Project: Mahout
          Issue Type: Bug
          Components: Clustering
    Affects Versions: 0.9
         Environment: CentOS 6.6 with HDP 2.2
            Reporter: Ha Son Hai


Hi,
I was trying to run some example of mahout on an hadoop platform and saw that 
when kmeans running in local host, it return success. However, when it run with 
HDFS, the mahout look for the intermediate result on local host instead of on 
HDFS if we use relative path.
I have to use absolute path of the input and output if I want kmeans to run 
correctly.

Here is an typical error when running on HDFS:

15/03/26 12:15:07 INFO mapreduce.Job: Task Id : 
attempt_1426848955524_0062_m_000000_2, Status : FAILED
Error: java.lang.IllegalStateException: output/clusters-0
        at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:78)
        at 
org.apache.mahout.clustering.classify.ClusterClassifier.readFromSeqFiles(ClusterClassifier.java:208)
        at 
org.apache.mahout.clustering.iterator.CIMapper.setup(CIMapper.java:44)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:415)
       at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.FileNotFoundException: File output/clusters-0 does not exist
        at 
org.apache.hadoop.fs.RawLocalFileSystem.listStatus(RawLocalFileSystem.java:376)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.listStatus(ChecksumFileSystem.java:570)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1485)
        at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1525)
        at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterator.<init>(SequenceFileDirValueIterator.java:70)
        at 
org.apache.mahout.common.iterator.sequencefile.SequenceFileDirValueIterable.iterator(SequenceFileDirValueIterable.java:76)
        ... 10 more

15/03/26 12:15:16 INFO mapreduce.Job:  map 100% reduce 0%
15/03/26 12:15:17 INFO mapreduce.Job:  map 100% reduce 100%
15/03/26 12:15:17 INFO mapreduce.Job: Job job_1426848955524_0062 failed with 
state FAILED due to: Task failed task_1426848955524_0062_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/03/26 12:15:17 INFO mapreduce.Job: Counters: 9
        Job Counters
                Failed map tasks=4
                Launched map tasks=4
                Other local map tasks=3
                Rack-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=23087
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=23087
                Total vcore-seconds taken by all map tasks=23087
                Total megabyte-seconds taken by all map tasks=23641088
Exception in thread "main" java.lang.InterruptedException: Cluster Iteration 1 
failed processing output/clusters-1
        at 
org.apache.mahout.clustering.iterator.ClusterIterator.iterateMR(ClusterIterator.java:183)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.buildClusters(KMeansDriver.java:224)
        at 
org.apache.mahout.clustering.kmeans.KMeansDriver.run(KMeansDriver.java:147)
        at 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.run(Job.java:135)
        at 
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:60)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to