[ 
https://issues.apache.org/jira/browse/MAHOUT-602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12988673#comment-12988673
 ] 

Lance Norskog commented on MAHOUT-602:
--------------------------------------

Scenario:

I attempted to follow the tutorial on the wiki. The wiki page apparently 
predates the 'bin/mahout' shell script. I used that instead of the given 
bin/hadoop commands.

The command given to create a "file descriptor" did not work. It includes a 
string of letters which define a CSV file. The string is wrong.
{code}
bin/mahout org.apache.mahout.df.tools.Describe -p testdata/KDDTrain+.arff 
-f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L
{code}
This throws an exception. I added a '1' (one) to the beginning and that made it 
run. This seems to be the correct fix.

After this ran, I tried the next command:
{code}
bin/mahout org.apache.mahout.df.mapreduce.BuildForest 
-Dmapred.max.split.size=1874231 -oob -d testdata/KDDTrain+_20Percent.arff 
-ds testdata/KDDTrain+_20Percent.info -sl 5 -p -t 100 -o nsl-forest
{code}
This throws an ArrayIndexOutOfBounds exception. The full logfile is attached.

I tried this same job with different sizes of mapred.split.size, and got the 
same exception and index of '100'.

The logfile is attached as partialImp_fullKDD_errors.log.

> "Partial Implementation" throws exceptions
> ------------------------------------------
>
>                 Key: MAHOUT-602
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-602
>             Project: Mahout
>          Issue Type: Bug
>         Environment: Macos X
> java version "1.6.0_22"
> Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261)
> Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode)
>            Reporter: Lance Norskog
>         Attachments: partialImp_fullKDD_errors.log
>
>
> The "Partial Implementation" described on the wiki page [Partial 
> Implementation|https://cwiki.apache.org/confluence/display/MAHOUT/Partial+Implementation]
>  fails with the given dataset and operations.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to