Re: Methods for Naming Clusters

Dawid Weiss Wed, 06 Jan 2010 00:56:57 -0800

I haven't seen this paper, but I believe methods other than SVD may
yield similar results. This in fact was one of the inspirations for my
own PhD thesis -- I used a simple numerical clustering technique
instead of SVD with a similar outcome. I remember papers like this
one:


Inderjit S. Dhillon and Dharmendra S. Modha. Concept Decompositions
for Large Sparse Text
Data Using Clustering. Machine Learning, 42(1–2):143–175, 2001.

claiming that even simple clustering techniques allow you to
approximate matrix decompositions sensibly for the task of document
retrieval, for example. Interesting.

D.


On Mon, Jan 4, 2010 at 11:12 PM, Ted Dunning <[email protected]> wrote:
> Btw... relative to the cost of decomposition, have you seen the recent spate
> of articles on stochastic decomposition?  It can dramatically speed up LSA.
>
> See http://arxiv.org/abs/0909.4061v1 for a good survey.  My guess is that
> you don't even need to do the SVD and could just use a random projection
> with a single power step (which is nearly equivalent to random indexing).
>
> On Mon, Jan 4, 2010 at 11:57 AM, Dawid Weiss <[email protected]> wrote:
>
>> We agree, it was just me explaining things vaguely. The bottom line
>> is: a lot depends on what you're planning to do with the clusters and
>> the methodology should be suitable to this.
>>
>> Dawid
>>
>>
>> On Mon, Jan 4, 2010 at 8:53 PM, Ted Dunning <[email protected]> wrote:
>> > I think I agree with this for clusters that are intended for human
>> > consumption, but I am sure that I disagree with this if you are looking
>> to
>> > use the clusters internally for machine learning purposes.
>> >
>> > The basic idea for the latter is that the distances to a bunch of
>> clusters
>> > can be used as a description of a point.  This description in terms of
>> > distances to cluster centroids can make some machine learning tasks
>> vastly
>> > easier.
>> >
>> > On Mon, Jan 4, 2010 at 11:44 AM, Dawid Weiss <[email protected]>
>> wrote:
>> >
>> >> What's worse -- neither method is "better". We at Carrot2 have a
>> >> strong feeling that clusters should be described properly in order to
>> >> be useful, but one may argue that in many, many applications of
>> >> clustering, the labels are _not_ important and just individual
>> >> features of clusters (like keywords or even documents themselves) are
>> >> enough.
>> >>
>> >
>> >
>> >
>> > --
>> > Ted Dunning, CTO
>> > DeepDyve
>> >
>>
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Methods for Naming Clusters

Reply via email to