right, shit in - shit out :-).
True. But in most cases clustering of search results can yield sensible
clusters. Try, for example:
http://demo.carrot-search.com/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-cluster-odp&resultsRequested=200
We in fact use Lucene for this demo (indexing ODP categories) --
http://www.carrot-search.com/demos.html
An open source clustering component isn't much worse (with Google
serving as the data source):
http://carrot.cs.put.poznan.pl/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-google-en&resultsRequested=100
Compare it with (same algorithm) AllTheWeb:
http://carrot.cs.put.poznan.pl/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-alltheweb-en&resultsRequested=100
As you said -- much depends on the data, but there is also a lot of
space for the clustering algorithm (try identical inputs and different
algorithms and you'll see the difference).
D.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]