None of the clustering implementations hard code the filesystem. The file names are constructed from the input and output filepath arguments.

Jeff

Grant Ingersoll wrote:
I seem to recall this being something you have to set in your Hadoop configuration. Or, let me double check that we aren't hard-coding the FS in our Job.

-Grant

On Apr 15, 2009, at 1:27 PM, Stephen Green wrote:

On Apr 14, 2009, at 6:54 PM, Stephen Green wrote:
On Apr 14, 2009, at 5:17 PM, Grant Ingersoll wrote:

I would be concerned about the fact that EMR is using 0.18 and Mahout is on 0.19 (which of course raises another concern expressed by Owen O'Malley to me at ApacheCon: No one uses 0.19)

Well, I did run Mahout locally on a 0.18.3 install, but that was writing to and reading from HDFS. I can build a custom mahout-examples that has the 0.18.3 Hadoop jars (or perhaps no hadoop jar at all...) I'm guessing if EMR is on 0.18.3 and it gets popular, then you're going to have to deal with that problem.


More fun today. I checked out the mahout-0.1 release and rebuilt mahout. I took the mahout-examples job, removed the hadoop jar, and then tried to run the KMeans clustering against the synthetic control data. This failed with the same exception that I was originally getting yesterday:

java.lang.IllegalArgumentException: Wrong FS: s3n://mahout-output/, expected: hdfs://domU-12-31-38-01-C5-22.compute-1.internal:9000
       at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)
at org.apache.hadoop.dfs.DistributedFileSystem.checkPath(DistributedFileSystem.java:84) at org.apache.hadoop.dfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:140) at org.apache.hadoop.dfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:408)
       at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:667)
at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.runJob(Job.java:77) at org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.main(Job.java:43)
       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
       at java.lang.reflect.Method.invoke(Method.java:597)
       at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
       at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
       at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

Steve
--
Stephen Green                      //   [email protected]
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692




--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search




Reply via email to