[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Berttenfall M. (Commented) (JIRA) Sun, 18 Dec 2011 08:26:57 -0800

    [ 
https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171893#comment-13171893
 ]


Berttenfall M. commented on MAHOUT-932:
---------------------------------------

To clear that up: using ARFF files does not work at any step I tested (neither 
Descripe nor BuildForest).

So I converted the files to the UCI format (which is basically a headerfree CSV 
file) and the classifier appears to like this format. As long as the array is 
in the bounds. :-/
                
> RandomForest quits with ArrayIndexOutOfBoundsException while running sample
> ---------------------------------------------------------------------------
>
>                 Key: MAHOUT-932
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-932
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.6
>         Environment: Mac OS X, current Mac OS shipped Java version, latest 
> checkout from 17.12.2011
> Dual Core MacBook Pro 2009, 8 Gb, SSD
>            Reporter: Berttenfall M.
>            Priority: Minor
>              Labels: Classifier, DecisionForest, RandomForest
>
> Hello,
> when running the example under 
> https://cwiki.apache.org/MAHOUT/partial-implementation.html with the 
> recommended data sets several issues occur.
> First: ARFF files seem no longer to be supported, I've been using the UCI 
> format as recommended here 
> (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, 
> Mahout quits when creating the description file (wrong number of attributes 
> in the string), using UCI format it works.
> The main error happends during the BuildForest step (I could not test 
> TestForest, due to missing tree).
> Running:
> $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest 
> -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info 
> -sl 5 -p -t 100 -o nsl-forest.
> I tested different split.size values. 1874231, 187423, 18742 give the 
> following error. 1874 does not finish on my machine (Dual Core MacBook Pro 
> 2009, 8 Gb, SSD).
> It quits after a while (map is almost done) with the following message:
> 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done.
> 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is 
> done. And is in the process of commiting
> 11/12/17 16:23:24 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is 
> allowed to commit now
> 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task 
> 'attempt_local_0001_m_000999_0' to 
> file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest
> 11/12/17 16:23:27 INFO mapred.LocalJobRunner: 
> 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done.
> 11/12/17 16:23:28 INFO mapred.JobClient:  map 100% reduce 0%
> 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001
> 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Output Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Written=41869032
> 11/12/17 16:23:28 INFO mapred.JobClient:   FileSystemCounters
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_READ=37443033225
> 11/12/17 16:23:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44946910704
> 11/12/17 16:23:28 INFO mapred.JobClient:   File Input Format Counters 
> 11/12/17 16:23:28 INFO mapred.JobClient:     Bytes Read=20478569
> 11/12/17 16:23:28 INFO mapred.JobClient:   Map-Reduce Framework
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map input records=125973
> 11/12/17 16:23:28 INFO mapred.JobClient:     Spilled Records=0
> 11/12/17 16:23:28 INFO mapred.JobClient:     Map output records=100000
> 11/12/17 16:23:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=215000
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100
>       at 
> org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126)
>       at 
> org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89)
>       at 
> org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303)
>       at 
> org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201)
>       at 
> org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>       at 
> org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>       at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>       at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> PS: I adjusted the class to .classifier.df. and removed -oop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-932) RandomForest quits with ArrayIndexOutOfBoundsException while running sample

Reply via email to