[ https://issues.apache.org/jira/browse/MAHOUT-932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171893#comment-13171893 ]
Berttenfall M. commented on MAHOUT-932: --------------------------------------- To clear that up: using ARFF files does not work at any step I tested (neither Descripe nor BuildForest). So I converted the files to the UCI format (which is basically a headerfree CSV file) and the classifier appears to like this format. As long as the array is in the bounds. :-/ > RandomForest quits with ArrayIndexOutOfBoundsException while running sample > --------------------------------------------------------------------------- > > Key: MAHOUT-932 > URL: https://issues.apache.org/jira/browse/MAHOUT-932 > Project: Mahout > Issue Type: Bug > Components: Classification > Affects Versions: 0.6 > Environment: Mac OS X, current Mac OS shipped Java version, latest > checkout from 17.12.2011 > Dual Core MacBook Pro 2009, 8 Gb, SSD > Reporter: Berttenfall M. > Priority: Minor > Labels: Classifier, DecisionForest, RandomForest > > Hello, > when running the example under > https://cwiki.apache.org/MAHOUT/partial-implementation.html with the > recommended data sets several issues occur. > First: ARFF files seem no longer to be supported, I've been using the UCI > format as recommended here > (https://cwiki.apache.org/MAHOUT/breiman-example.html). Using ARFF files, > Mahout quits when creating the description file (wrong number of attributes > in the string), using UCI format it works. > The main error happends during the BuildForest step (I could not test > TestForest, due to missing tree). > Running: > $MAHOUT_HOME/bin/mahout org.apache.mahout.classifier.df.mapreduce.BuildForest > -Dmapred.max.split.size=1874231 -d convertedData/data.data -ds KDDTrain+.info > -sl 5 -p -t 100 -o nsl-forest. > I tested different split.size values. 1874231, 187423, 18742 give the > following error. 1874 does not finish on my machine (Dual Core MacBook Pro > 2009, 8 Gb, SSD). > It quits after a while (map is almost done) with the following message: > 11/12/17 16:23:24 INFO mapred.Task: Task 'attempt_local_0001_m_000998_0' done. > 11/12/17 16:23:24 INFO mapred.Task: Task:attempt_local_0001_m_000999_0 is > done. And is in the process of commiting > 11/12/17 16:23:24 INFO mapred.LocalJobRunner: > 11/12/17 16:23:24 INFO mapred.Task: Task attempt_local_0001_m_000999_0 is > allowed to commit now > 11/12/17 16:23:24 INFO output.FileOutputCommitter: Saved output of task > 'attempt_local_0001_m_000999_0' to > file:/Users/martin/Documents/Studium/Master/LargeScaleProcessing/Repository/mahout_algorithms_evaluation/testingRandomForests/nsl-forest > 11/12/17 16:23:27 INFO mapred.LocalJobRunner: > 11/12/17 16:23:27 INFO mapred.Task: Task 'attempt_local_0001_m_000999_0' done. > 11/12/17 16:23:28 INFO mapred.JobClient: map 100% reduce 0% > 11/12/17 16:23:28 INFO mapred.JobClient: Job complete: job_local_0001 > 11/12/17 16:23:28 INFO mapred.JobClient: Counters: 8 > 11/12/17 16:23:28 INFO mapred.JobClient: File Output Format Counters > 11/12/17 16:23:28 INFO mapred.JobClient: Bytes Written=41869032 > 11/12/17 16:23:28 INFO mapred.JobClient: FileSystemCounters > 11/12/17 16:23:28 INFO mapred.JobClient: FILE_BYTES_READ=37443033225 > 11/12/17 16:23:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=44946910704 > 11/12/17 16:23:28 INFO mapred.JobClient: File Input Format Counters > 11/12/17 16:23:28 INFO mapred.JobClient: Bytes Read=20478569 > 11/12/17 16:23:28 INFO mapred.JobClient: Map-Reduce Framework > 11/12/17 16:23:28 INFO mapred.JobClient: Map input records=125973 > 11/12/17 16:23:28 INFO mapred.JobClient: Spilled Records=0 > 11/12/17 16:23:28 INFO mapred.JobClient: Map output records=100000 > 11/12/17 16:23:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=215000 > Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 100 > at > org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.processOutput(PartialBuilder.java:126) > at > org.apache.mahout.classifier.df.mapreduce.partial.PartialBuilder.parseOutput(PartialBuilder.java:89) > at > org.apache.mahout.classifier.df.mapreduce.Builder.build(Builder.java:303) > at > org.apache.mahout.classifier.df.mapreduce.BuildForest.buildForest(BuildForest.java:201) > at > org.apache.mahout.classifier.df.mapreduce.BuildForest.run(BuildForest.java:163) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > org.apache.mahout.classifier.df.mapreduce.BuildForest.main(BuildForest.java:225) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > PS: I adjusted the class to .classifier.df. and removed -oop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira