Is there a description of the output structure of the results, I see also
some folders like points which is used by the ClusterDumper but I do not
know the technical details.
I would be interested what kind of data is available as a result of the
clustering. Is it different when different algorithm is used (kmeans,
canopy, dirichlet)?

I also have one more theoretical question: I get for the cluster with the
highest "points" a term - the third by weight which is at the same time with
word freq = 9 - according to Solr Dictionary (and according to my knowledge
of the corpora too) - this is for 23 000+ input docs. Is it something with
the kmeans algorithm? the rest of the terms, clusters seem to be somehow ok,
but that one really astonished me, I am almost sure it is not a problem with
the (index - dictionary mapping) like I had before ;) (but that was general
problem then - I was using the wrong dictionary file).
I am running with convergence 0.5 is that ok?

Best regards,
Bogdan

Reply via email to