[
https://issues.apache.org/jira/browse/MAHOUT-1247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13679090#comment-13679090
]
Grant Ingersoll edited comment on MAHOUT-1247 at 6/9/13 3:47 PM:
-----------------------------------------------------------------
I think I see the issue. The cache file is "local", the Iterator, however, has
a Hadoop conf that is expecting an HDFS file, hence it can't find it.
For instance, the logs show:
{quote}11:38:49,638 INFO
org.apache.mahout.vectorizer.term.TFPartialVectorReducer: Cache Files:
[/tmp/hadoop-grantingersoll/mapred/local/taskTracker/distcache/2677051046998143225_1262960862_697707077/localhostdicVec/dictionary.file-0]
2013{quote}
Notice it is missing the scheme. Going to try explicitly setting the scheme to
file://
was (Author: gsingers):
I think I see the issue. The cache file is "local", the Iterator, however,
has a Hadoop conf that is expecting an HDFS file, hence it can't find it.
> cluster-reuters doesn't work on Hadoop
> --------------------------------------
>
> Key: MAHOUT-1247
> URL: https://issues.apache.org/jira/browse/MAHOUT-1247
> Project: Mahout
> Issue Type: Bug
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Fix For: 0.8
>
>
> At least two issues:
> 1. MAHOUT-992 messed up the Distributed Cache stuff somehow
> 2. The ExtractReuters data is not being moved to HDFS.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira