Your application works very well, congrats! May I ask how the input is looking? How are the terms selected, how do you model phrases? Do you handle titles different from the short summaries?
What I am doing is: I remove stopwords, stem terms using snowballs default english stemmer, and then already build feature vectors for the selected terms. I don't have information about phrases in there, yet. I ask, because the descriptions of your clusters are very nice. How are they done? (I know you are using SVD to do it, and I am too, but I am only having single terms, and you have nicely formulated phrases.) Cheers Daniel Dawid Weiss schrieb: > >> right, shit in - shit out :-). > > > True. But in most cases clustering of search results can yield > sensible clusters. Try, for example: > > http://demo.carrot-search.com/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-cluster-odp&resultsRequested=200 > > > We in fact use Lucene for this demo (indexing ODP categories) -- > > http://www.carrot-search.com/demos.html > > An open source clustering component isn't much worse (with Google > serving as the data source): > > http://carrot.cs.put.poznan.pl/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-google-en&resultsRequested=100 > > > Compare it with (same algorithm) AllTheWeb: > > http://carrot.cs.put.poznan.pl/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-alltheweb-en&resultsRequested=100 > > > As you said -- much depends on the data, but there is also a lot of > space for the clustering algorithm (try identical inputs and different > algorithms and you'll see the difference). > > D. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]