[
https://issues.apache.org/jira/browse/MAHOUT-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988673#comment-12988673
]
Lance Norskog commented on MAHOUT-602:
--------------------------------------
Scenario:
I attempted to follow the tutorial on the wiki. The wiki page apparently
predates the 'bin/mahout' shell script. I used that instead of the given
bin/hadoop commands.
The command given to create a "file descriptor" did not work. It includes a
string of letters which define a CSV file. The string is wrong.
{code}
bin/mahout org.apache.mahout.df.tools.Describe -p testdata/KDDTrain+.arff
-f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L
{code}
This throws an exception. I added a '1' (one) to the beginning and that made it
run. This seems to be the correct fix.
After this ran, I tried the next command:
{code}
bin/mahout org.apache.mahout.df.mapreduce.BuildForest
-Dmapred.max.split.size=1874231 -oob -d testdata/KDDTrain+_20Percent.arff
-ds testdata/KDDTrain+_20Percent.info -sl 5 -p -t 100 -o nsl-forest
{code}
This throws an ArrayIndexOutOfBounds exception. The full logfile is attached.
I tried this same job with different sizes of mapred.split.size, and got the
same exception and index of '100'.
The logfile is attached as partialImp_fullKDD_errors.log.
> "Partial Implementation" throws exceptions
> ------------------------------------------
>
> Key: MAHOUT-602
> URL: https://issues.apache.org/jira/browse/MAHOUT-602
> Project: Mahout
> Issue Type: Bug
> Environment: Macos X
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
> Reporter: Lance Norskog
> Attachments: partialImp_fullKDD_errors.log
>
>
> The "Partial Implementation" described on the wiki page [Partial
> Implementation|https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation]
> fails with the given dataset and operations.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira