Re: Clustering techniques, tips and tricks

Bogdan Vatkov Tue, 05 Jan 2010 18:52:03 -0800

I customized the lucene index-to-vector dumper already quite a lot (e.g.
applied stop-words (from file), stop-regex) but I am wondering how the input
vectors are later reachable if I start from cluster vectors, you say points
are somehow doing that, where can I read more or can you tell me more, or is
there a piece of code which would best guide me through the points format?


On Wed, Jan 6, 2010 at 4:43 AM, Drew Farris <[email protected]> wrote:

> Each iteration of kmeans procuses a cluster-X folder, with X starting
> at 0. You would get clusters-0 in cases where the clusters converge
> after the first run.
>
> Whether your clusters will retain document id's is based on how you
> create the vectors. For example, the lucene vector dumper can be told
> to extract the value from a specific field in the index to use for the
> vector labels. These are carried through to the points file produced
> at the end of the k-means run.
>
> On Tue, Jan 5, 2010 at 9:36 PM, Bogdan Vatkov <[email protected]>
> wrote:
> > Is there some description of the content of the cluster vector?
> > I also noticed that I end up with some folders clusters-0 and clusters-1,
> > but sometimes it is only clusters-0, when do we get the different folders
> > and which should be used as end result - e.g. by the ClusterDumper?
>



-- 
Best regards,
Bogdan

Re: Clustering techniques, tips and tricks

Reply via email to