so i've run Dirichlet Clustering using Mahout and i'm trying to see the clusterdump. Of course i'm using a combination of ClusterDumper, DirichletOutputState and DirichletCluster and TestL1ModelClustering to help with the output.
so far i've successfully read each file in each state-x output folder. The issue is that the vectors appear to be serialized as <Text, DirichletCluster> pairs in each binary dump, which is fine. However, after debugging it turns out that the model for each DirichletCluster is null....and this make sense, since i'm reading from the dump file as follows: SequenceFile.Reader reader = new SequenceFile.Reader(fileSystem, inputPath, conf); Text key = (Text) reader.getKeyClass().newInstance(); DirichletCluster cluster = (DirichletCluster) reader.getValueClass().newInstance(); i tried to set the fields for the DirichletCluster by using the following method readFields(DataInput in); DataInput istream = new DataInputStream(new FileInputStream(new File(fileName))); cluster.readFields(istream); and i have a null pointer exception... can i have a few suggestion on how to proceed here... ----- --cheers Delroy -- View this message in context: http://lucene.472066.n3.nabble.com/Dirichlet-ClusterDump-Output-tp777637p777637.html Sent from the Mahout User List mailing list archive at Nabble.com.