[ https://issues.apache.org/jira/browse/MAHOUT-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Palumbo resolved MAHOUT-1863. ------------------------------------ Resolution: Fixed Assignee: Andrew Palumbo Fix Version/s: 0.13.0 thanks [~chu11]! > cluster-syntheticcontrol.sh errors out with "Input path does not exist" > ----------------------------------------------------------------------- > > Key: MAHOUT-1863 > URL: https://issues.apache.org/jira/browse/MAHOUT-1863 > Project: Mahout > Issue Type: Bug > Affects Versions: 0.12.0 > Reporter: Albert Chu > Assignee: Andrew Palumbo > Priority: Minor > Fix For: 0.13.0 > > > Running cluster-syntheticcontrol.sh on 0.12.0 resulted in this error: > {noformat} > Exception in thread "main" > org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does > not exist: hdfs://apex156:54310/user/achu/testdata > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:323) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:265) > at > org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:387) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301) > at > org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318) > at > org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) > at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) > at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308) > at > org.apache.mahout.clustering.conversion.InputDriver.runJob(InputDriver.java:108) > at > org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.run(Job.java:133) > at > org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job.main(Job.java:62) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > It appears cluster-syntheticcontrol.sh breaks under 0.12.0 due to patch > {noformat} > commit 23267a0bef064f3351fd879274724bcb02333c4a > {noformat} > one change in question > {noformat} > - $DFS -mkdir testdata > + $DFS -mkdir ${WORK_DIR}/testdata > {noformat} > now requires that the -p option be specified to -mkdir. This fix is simple. > Another change: > {noformat} > - $DFS -put ${WORK_DIR}/synthetic_control.data testdata > + $DFS -put ${WORK_DIR}/synthetic_control.data ${WORK_DIR}/testdata > {noformat} > appears to break the example b/c in: > examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/fuzzykmeans/Job.java > examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java > the file 'testdata' is hard coded into the example as just 'testdata'. > ${WORK_DIR}/testdata needs to be passed in as an option. > Reverting the lines listed above fixes the problem. However, the reverting > presumably breaks the original problem listed in MAHOUT-1773. > I originally attempted to fix this by simply passing in the option "--input > ${WORK_DIR}/testdata" into the command in the script. However, a number of > other options are required if one option is specified. > I considered modifying the above Job.java files to take a minimal number of > arguments and set the rest to some default, but that would have also required > changes to DefaultOptionCreator.java to make required options non-optional, > which I didn't want to go down the path of determining what other examples > had requires/non-requires requirements. > So I just passed in every required option into cluster-syntheticcontrol.sh to > fix this, using whatever defaults were hard coded into the Job.java files > above. > I'm sure there's a better way to do this, and I'm happy to supply a patch, > but thought I'd start with this. > Github pull request to be sent shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)