Your application works very well, congrats! May I ask how the input is
looking? How are the terms selected, how do you model phrases? Do you
handle titles different from the short summaries?

What I am doing is: I remove stopwords, stem terms using snowballs
default english stemmer, and then already build feature vectors for the
selected terms. I don't have information about phrases in there, yet.
I ask, because the descriptions of your clusters are very nice. How are
they done? (I know you are using SVD to do it, and I am too, but I am
only having single terms, and you have nicely formulated phrases.)

Cheers
Daniel


Dawid Weiss schrieb:

>
>> right, shit in - shit out :-).
>
>
> True. But in most cases clustering of search results can yield
> sensible clusters. Try, for example:
>
> http://demo.carrot-search.com/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-cluster-odp&resultsRequested=200
>
>
> We in fact use Lucene for this demo (indexing ODP categories) --
>
> http://www.carrot-search.com/demos.html
>
> An open source clustering component isn't much worse (with Google
> serving as the data source):
>
> http://carrot.cs.put.poznan.pl/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-google-en&resultsRequested=100
>
>
> Compare it with (same algorithm) AllTheWeb:
>
> http://carrot.cs.put.poznan.pl/carrot2-remote-controller/newsearch.do?query=chips&processingChain=carrot2.process.lingo-alltheweb-en&resultsRequested=100
>
>
> As you said -- much depends on the data, but there is also a lot of
> space for the clustering algorithm (try identical inputs and different
> algorithms and you'll see the difference).
>
> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to