I customized the lucene index-to-vector dumper already quite a lot (e.g. applied stop-words (from file), stop-regex) but I am wondering how the input vectors are later reachable if I start from cluster vectors, you say points are somehow doing that, where can I read more or can you tell me more, or is there a piece of code which would best guide me through the points format?
On Wed, Jan 6, 2010 at 4:43 AM, Drew Farris <[email protected]> wrote: > Each iteration of kmeans procuses a cluster-X folder, with X starting > at 0. You would get clusters-0 in cases where the clusters converge > after the first run. > > Whether your clusters will retain document id's is based on how you > create the vectors. For example, the lucene vector dumper can be told > to extract the value from a specific field in the index to use for the > vector labels. These are carried through to the points file produced > at the end of the k-means run. > > On Tue, Jan 5, 2010 at 9:36 PM, Bogdan Vatkov <[email protected]> > wrote: > > Is there some description of the content of the cluster vector? > > I also noticed that I end up with some folders clusters-0 and clusters-1, > > but sometimes it is only clusters-0, when do we get the different folders > > and which should be used as end result - e.g. by the ClusterDumper? > -- Best regards, Bogdan
