Frank,

what did you mean "there is a / missing between clustersOutput and
clusteredPoints in the path."


I just tried two more new approaches of setting up pathes
*new
Path(clusterOutput+"/clusters"+"/clusteredPoints"+"/part-m-00000"),conf);
new Path(clusterOutput+"/clusters/clusteredPoints"+"/part-m-00000"),conf);

*Both of them causes the following error messages:
File newsClusters/clusters/clusters/clusteredPoints/part-m-00000 does not
exist.

It seems to me that "clusteredPoints" inherently equals to
"/clusters/clusteredPoints". The original code given in "Mahout in Action"
uses  *Cluster.**CLUSTERED_POINTS_DIR   *However, their usages causes error
message as well, like what I included in my previous post,
*File newsClusters/clustersclusteredPoints/part-m-00000 does not exist*.

This really confuses a lot.

Thanks.


On Tue, Aug 9, 2011 at 1:22 PM, Frank Scholten <[email protected]>wrote:

> It seems  there is a / missing between clustersOutput and
> clusteredPoints in the path.
>
> Cheers,
>
> Frank
>
> Sent from a Hungarian keyboard at Sziget festival
>
> On Tue, Aug 9, 2011 at 7:07 PM, eric skinner <[email protected]>
> wrote:
> > Hello,
> >
> > I am practicing the NewsKMeansClustering.java, an example code given in
> > chapter 9 of Mahout-in-Action? I run this program against a directory of
> > sequence files. The output error message is as follows:
> >
> > Exception in thread "main" java.io.FileNotFoundException:* File
> > newsClusters/clustersclusteredPoints/part-m-00000 does not exist*.
> >  at
> >
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
> >  at
> >
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
> >
> > at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:676)
> >  at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1417)
> >  at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1412)
> > at
> >
> mia.clustering.ch09.NewsKMeansClustering.main(NewsKMeansClustering.java:76)
> >
> > As reference, the directory structure of the result generated after
> running
> > this program is shown as follows as well:
> >
> > ~/workspaceMahout1/recommender/newsClusters% ls
> >  canopy-centroids clusters df-count dictionary.file-0 frequency.file-0
> > tfidf-vectors tf-vectors tokenized-documents wordcount
> >  ~/workspaceMahout1/recommender/newsClusters/clusters/clusteredPoints% ls
> > part-m-00000
> >
> > Afterwards, I change the code from the original one
> >
> > new Path(clusterOutput+Cluster.CLUSTERED_POINTS_DIR +”/part-m-00000”),
> conf);
> >
> >
> > to
> >
> > *new Path(clusterOutput+”/clusteredPoints”+”/part-m-00000”), conf);*
> >
> >
> > The program can go through without giving the above error messages. I
> would
> > like to know is that a bug in the original code or are there any other
> > hidden issues?
> >
>

Reply via email to