Thanks for the Luke hint, I will try it out but now I noticed something else which is very very strange - I ran k-means on 23K+ docs and with 50 clusters which all seem to be very very strange as top term collection - I would say for 90% of the top terms I get some words which I barely recognize. I did a short check and for one particular term, which anyway sounded strange and which appeared in top terms for 9 of the 50 clusters, I found that it has "doc freq" = 2 in the Solr dictionary. How is this even possible - for 23, 000 docs and for a term which is mentioned only 2 times I have it as a top term in 9 clusters? I definitely did something wrong, do you have an idea what that could be?
- Stopwords work for Solr but not for Mahout Bogdan Vatkov
- Re: Stopwords work for Solr but not for Mahout Grant Ingersoll
- Re: Stopwords work for Solr but not for Mahout Bogdan Vatkov
- Re: Stopwords work for Solr but not for Maho... Grant Ingersoll
- Re: Stopwords work for Solr but not for ... Bogdan Vatkov
- Re: Stopwords work for Solr but not... Grant Ingersoll
- Re: Stopwords work for Solr but... Bogdan Vatkov
- Re: Stopwords work for Solr... Grant Ingersoll
- Re: Stopwords work for Solr but not... Ted Dunning
- Re: Stopwords work for Solr but... Bogdan Vatkov
- Re: Stopwords work for Solr... Ted Dunning
- Re: Stopwords work for Solr... Benson Margulies
- Re: Stopwords work for Solr... Drew Farris
- Re: Stopwords work for Solr... Grant Ingersoll
- Re: Stopwords work for Solr... Grant Ingersoll
- Re: Stopwords work for Solr... Grant Ingersoll
- Re: Stopwords work for Solr... Bogdan Vatkov
- Re: Stopwords work for Solr... Grant Ingersoll
- Re: Stopwords work for Solr... Bogdan Vatkov
- Re: Stopwords work for Solr... Grant Ingersoll
