Is there a description of the output structure of the results, I see also some folders like points which is used by the ClusterDumper but I do not know the technical details. I would be interested what kind of data is available as a result of the clustering. Is it different when different algorithm is used (kmeans, canopy, dirichlet)?
I also have one more theoretical question: I get for the cluster with the highest "points" a term - the third by weight which is at the same time with word freq = 9 - according to Solr Dictionary (and according to my knowledge of the corpora too) - this is for 23 000+ input docs. Is it something with the kmeans algorithm? the rest of the terms, clusters seem to be somehow ok, but that one really astonished me, I am almost sure it is not a problem with the (index - dictionary mapping) like I had before ;) (but that was general problem then - I was using the wrong dictionary file). I am running with convergence 0.5 is that ok? Best regards, Bogdan
