RandomForest quits with ArrayIndexOutOfBoundsException while running sample
---------------------------------------------------------------------------

                 Key: MAHOUT-932
                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
             Project: Mahout
          Issue Type: Bug
          Components: Classification
    Affects Versions: 0.6
         Environment: Mac OS X, current Mac OS shipped Java version, latest 
checkout from 17.12.2011

Dual Core MacBook Pro 2009, 8 Gb, SSD
            Reporter: Berttenfall M.
            Priority: Minor


Hello,

when running the example under 
https://cwiki.apache.org/MAHOUT/partial-implementation.html with the 
recommended data sets several issues occur.
First: ARFF files seem no longer to be supported, I've been using the UCI 
format as recommended here 
(https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, 
Mahout quits when creating the description file (wrong number of attributes in 
the string), using UCI format it works.

The main error happends during the BuildForest step (I could not test 
TestForest, due to missing tree).
Running:
$MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest 
-Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info 
-sl 5 -p -t 100 -o nsl-forest.

I tested different split.size values. 1874231, 187423, 18742 give the following 
error. 1874 does not finish on my machine (Dual Core MacBook Pro 2009, 8 Gb, 
SSD).

It quits after a while (map is almost done) with the following message:
11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is done. 
And is in the process of commiting
11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is 
allowed to commit now
11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 
'attempt_local_0001_m_000999_0' to 
file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
        at 
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
        at 
org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
        at 
org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at 
org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to