On Tue, Aug 9, 2011 at 7:42 PM, eric skinner <[email protected]> wrote: > Frank, > > what did you mean "there is a / missing between clustersOutput and > clusteredPoints in the path."
The clustering job outputs points in the subdirectory 'clusteredPoints' directly under the given output path. > Afterwards, I change the code from the original one > new Path(clusterOutput+Cluster.CLUSTERED_POINTS_DIR +”/part-m-00000”), conf); > to > *new Path(clusterOutput+”/clusteredPoints”+”/part-m-00000”), conf);* AFAIK the Cluster.CLUSTERED_POINTS_DIR constant does not start with a / So when you added the / in front of 'clusteredPoints' it worked. You can also use a SequenceFileDirIterable to iterate through the points. Frank > > > I just tried two more new approaches of setting up pathes > *new > Path(clusterOutput+"/clusters"+"/clusteredPoints"+"/part-m-00000"),conf); > new Path(clusterOutput+"/clusters/clusteredPoints"+"/part-m-00000"),conf); > > *Both of them causes the following error messages: > File newsClusters/clusters/clusters/clusteredPoints/part-m-00000 does not > exist. > > It seems to me that "clusteredPoints" inherently equals to > "/clusters/clusteredPoints". The original code given in "Mahout in Action" > uses *Cluster.**CLUSTERED_POINTS_DIR *However, their usages causes error > message as well, like what I included in my previous post, > *File newsClusters/clustersclusteredPoints/part-m-00000 does not exist*. > > This really confuses a lot. > > Thanks. > > > On Tue, Aug 9, 2011 at 1:22 PM, Frank Scholten <[email protected]>wrote: > >> It seems there is a / missing between clustersOutput and >> clusteredPoints in the path. >> >> Cheers, >> >> Frank >> >> Sent from a Hungarian keyboard at Sziget festival >> >> On Tue, Aug 9, 2011 at 7:07 PM, eric skinner <[email protected]> >> wrote: >> > Hello, >> > >> > I am practicing the NewsKMeansClustering.java, an example code given in >> > chapter 9 of Mahout-in-Action? I run this program against a directory of >> > sequence files. The output error message is as follows: >> > >> > Exception in thread "main" java.io.FileNotFoundException:* File >> > newsClusters/clustersclusteredPoints/part-m-00000 does not exist*. >> > at >> > >> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361) >> > at >> > >> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245) >> > >> > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:676) >> > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417) >> > at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412) >> > at >> > >> mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:76) >> > >> > As reference, the directory structure of the result generated after >> running >> > this program is shown as follows as well: >> > >> > ~/workspaceMahout1/recommender/newsClusters% ls >> > canopy-centroids clusters df-count dictionary.file-0 frequency.file-0 >> > tfidf-vectors tf-vectors tokenized-documents wordcount >> > ~/workspaceMahout1/recommender/newsClusters/clusters/clusteredPoints% ls >> > part-m-00000 >> > >> > Afterwards, I change the code from the original one >> > >> > new Path(clusterOutput+Cluster.CLUSTERED_POINTS_DIR +”/part-m-00000”), >> conf); >> > >> > >> > to >> > >> > *new Path(clusterOutput+”/clusteredPoints”+”/part-m-00000”), conf);* >> > >> > >> > The program can go through without giving the above error messages. I >> would >> > like to know is that a bug in the original code or are there any other >> > hidden issues? >> > >> >
