so i've run Dirichlet Clustering using Mahout and i'm trying to see the
clusterdump. Of course i'm using a combination of ClusterDumper,
DirichletOutputState and DirichletCluster and TestL1ModelClustering to help
with the output.

so far i've successfully read each file in each state-x output folder. The
issue is that the vectors appear to be serialized as <Text,
DirichletCluster> pairs in each binary dump, which is fine. However, after
debugging it turns out that the model for each DirichletCluster is
null....and this make sense, since i'm reading from the dump file as
follows:

SequenceFile.Reader  reader = new SequenceFile.Reader(fileSystem, inputPath,
conf);
Text key = (Text) reader.getKeyClass().newInstance();
DirichletCluster cluster = (DirichletCluster)
reader.getValueClass().newInstance();

i tried to set the fields for the DirichletCluster by using the following
method readFields(DataInput in);
DataInput istream = new DataInputStream(new FileInputStream(new
File(fileName)));
cluster.readFields(istream);

and i have a null pointer exception...

can i have a few suggestion on how to proceed here...

-----
--cheers
Delroy
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Dirichlet-ClusterDump-Output-tp777637p777637.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Reply via email to