Ok, so I managed to write a VectorIterable implementation to draw data from my database. Now, I'm in the process of understanding the output file that kMeans (with a Canopy input) produces. Someone, please, correct me if I'm mistaken. At first, my thought was that there were as many "cluster-i" directories as clusters detected from the dataset by the algorithm(s), until I printed out the content of the "part-00000" file in them. It seems as though it stores a <Writable> cluster ID and then a <Writable> Cluster, each line. Are those all the actual clusters detected? If so, what's the reason behind the directory nomenclature and its consecutive enumeration? Does every "part-00000", in different "cluster-i" directories, hold different clusters? And, what about the "points" directory? I can tell it follows a <VectorID, Value> register format. What's that value supposed to represent? The ID from the cluster it belongs, perhaps?
There really ought to be documentation about this somewhere. I don't know if I need some kind of permission, but I'm offering myself to write it and upload it to the Mahout wiki or wherever it should be, once I finished my project. Thanks in advanced. On Fri, Jun 26, 2009 at 1:54 PM, Sean Owen<[email protected]> wrote: > All of Mahout is generally Hadoop/HDFS based. Taste is a bit of > exception since it has a core that is independent of Hadoop and can > use data from files, databases, etc. It also happens to have some > clustering logic. So you can use, say, TreeClusteringRecommender to > generate user clusters, based on data in a database. This isn't > Mahout's primary clustering support, but, if it fits what you need, at > least it is there. > > On Fri, Jun 26, 2009 at 12:21 PM, nfantone<[email protected]> wrote: >> Thanks for the fast response, Grant. >> >> I am aware of what you pointed out about Taste. I just mentioned it to >> make a reference to something similar to what I needed to >> implement/use, namely the "DataModel" interface. >> >> I'm going to try the solution you suggested and write an >> implementation of VectorIterable. Expect me to come back here for >> feedback. >
