[
https://issues.apache.org/jira/browse/MAHOUT-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085055#comment-14085055
]
ASF GitHub Bot commented on MAHOUT-1594:
----------------------------------------
Github user andrewpalumbo commented on the pull request:
https://github.com/apache/mahout/pull/38#issuecomment-51099050
Hello, could you please add in a check for the $MAHOUT_LOCAL environment
variable so that the script can run off of both local and HDFS file systems?
Sent from my Verizon Wireless 4G LTE smartphone
<div>-------- Original message --------</div><div>From: roengram
<[email protected]> </div><div>Date:08/04/2014 2:45 AM (GMT-05:00)
</div><div>To: apache/mahout <[email protected]> </div><div>Subject:
[mahout] MAHOUT-1594 (#38) </div><div>
</div>
Detail: https://issues.apache.org/jira/browse/MAHOUT-1594
Factorization example doesn't work correctly with Hadoop version:
2.4.0.2.1.1.0-385. The reason is that the example uses local for working
directories. I've changed all references to local dirs to HDFS ones, change
Linux shell commands to hadoop equivalents.
You can merge this Pull Request by running:
git pull https://github.com/roengram/mahout MAHOUT.1594
Or you can view, comment on it, or merge it online at:
https://github.com/apache/mahout/pull/38
-- Commit Summary --
* Use HDFS instead of local dir
-- File Changes --
M examples/bin/factorize-movielens-1M.sh (14)
-- Patch Links --
https://github.com/apache/mahout/pull/38.patch
https://github.com/apache/mahout/pull/38.diff
---
Reply to this email directly or view it on GitHub:
https://github.com/apache/mahout/pull/38
> Example factorize-movielens-1M.sh does not use HDFS
> ---------------------------------------------------
>
> Key: MAHOUT-1594
> URL: https://issues.apache.org/jira/browse/MAHOUT-1594
> Project: Mahout
> Issue Type: Bug
> Components: Examples
> Affects Versions: 0.9
> Environment: Hadoop version: 2.4.0.2.1.1.0-385
> Git hash: 2b65475c3ab682ebd47cffdc6b502698799cd2c8 (trunk)
> Reporter: jaehoon ko
> Priority: Minor
> Labels: newbie, patch
> Fix For: 1.0
>
> Attachments: MAHOUT-1594.patch
>
>
> It seems that factorize-movielens-1M.sh does not use HDFS at all. All paths
> look local paths, not HDFS. So the example crashes immeidately because it
> cannot find input data from HDFS:
> {code}
> Exception in thread "main"
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does
> not exist: /tmp/mahout-work-hoseog.lee/movielens/ratings.csv
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:320)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:263)
> at
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:375)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:493)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:510)
> at
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1557)
> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1303)
> at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.run(DatasetSplitter.java:94)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> at
> org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter.main(DatasetSplitter.java:64)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:145)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:153)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)