Frank, what did you mean "there is a / missing between clustersOutput and clusteredPoints in the path."
I just tried two more new approaches of setting up pathes *new Path(clusterOutput+"/clusters"+"/clusteredPoints"+"/part-m-00000"),conf); new Path(clusterOutput+"/clusters/clusteredPoints"+"/part-m-00000"),conf); *Both of them causes the following error messages: File newsClusters/clusters/clusters/clusteredPoints/part-m-00000 does not exist. It seems to me that "clusteredPoints" inherently equals to "/clusters/clusteredPoints". The original code given in "Mahout in Action" uses *Cluster.**CLUSTERED_POINTS_DIR *However, their usages causes error message as well, like what I included in my previous post, *File newsClusters/clustersclusteredPoints/part-m-00000 does not exist*. This really confuses a lot. Thanks. On Tue, Aug 9, 2011 at 1:22 PM, Frank Scholten <[email protected]>wrote: > It seems there is a / missing between clustersOutput and > clusteredPoints in the path. > > Cheers, > > Frank > > Sent from a Hungarian keyboard at Sziget festival > > On Tue, Aug 9, 2011 at 7:07 PM, eric skinner <[email protected]> > wrote: > > Hello, > > > > I am practicing the NewsKMeansClustering.java, an example code given in > > chapter 9 of Mahout-in-Action? I run this program against a directory of > > sequence files. The output error message is as follows: > > > > Exception in thread "main" java.io.FileNotFoundException:* File > > newsClusters/clustersclusteredPoints/part-m-00000 does not exist*. > > at > > > org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) > > at > > > org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) > > > > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:676) > > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) > > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) > > at > > > mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:76) > > > > As reference, the directory structure of the result generated after > running > > this program is shown as follows as well: > > > > ~/workspaceMahout1/recommender/newsClusters% ls > > canopy-centroids clusters df-count dictionary.file-0 frequency.file-0 > > tfidf-vectors tf-vectors tokenized-documents wordcount > > ~/workspaceMahout1/recommender/newsClusters/clusters/clusteredPoints% ls > > part-m-00000 > > > > Afterwards, I change the code from the original one > > > > new Path(clusterOutput+Cluster.CLUSTERED_POINTS_DIR +”/part-m-00000”), > conf); > > > > > > to > > > > *new Path(clusterOutput+”/clusteredPoints”+”/part-m-00000”), conf);* > > > > > > The program can go through without giving the above error messages. I > would > > like to know is that a bug in the original code or are there any other > > hidden issues? > > >
