I haven't seen this paper, but I believe methods other than SVD may yield similar results. This in fact was one of the inspirations for my own PhD thesis -- I used a simple numerical clustering technique instead of SVD with a similar outcome. I remember papers like this one:
Inderjit S. Dhillon and Dharmendra S. Modha. Concept Decompositions for Large Sparse Text Data Using Clustering. Machine Learning, 42(1–2):143–175, 2001. claiming that even simple clustering techniques allow you to approximate matrix decompositions sensibly for the task of document retrieval, for example. Interesting. D. On Mon, Jan 4, 2010 at 11:12 PM, Ted Dunning <[email protected]> wrote: > Btw... relative to the cost of decomposition, have you seen the recent spate > of articles on stochastic decomposition? It can dramatically speed up LSA. > > See http://arxiv.org/abs/0909.4061v1 for a good survey. My guess is that > you don't even need to do the SVD and could just use a random projection > with a single power step (which is nearly equivalent to random indexing). > > On Mon, Jan 4, 2010 at 11:57 AM, Dawid Weiss <[email protected]> wrote: > >> We agree, it was just me explaining things vaguely. The bottom line >> is: a lot depends on what you're planning to do with the clusters and >> the methodology should be suitable to this. >> >> Dawid >> >> >> On Mon, Jan 4, 2010 at 8:53 PM, Ted Dunning <[email protected]> wrote: >> > I think I agree with this for clusters that are intended for human >> > consumption, but I am sure that I disagree with this if you are looking >> to >> > use the clusters internally for machine learning purposes. >> > >> > The basic idea for the latter is that the distances to a bunch of >> clusters >> > can be used as a description of a point. This description in terms of >> > distances to cluster centroids can make some machine learning tasks >> vastly >> > easier. >> > >> > On Mon, Jan 4, 2010 at 11:44 AM, Dawid Weiss <[email protected]> >> wrote: >> > >> >> What's worse -- neither method is "better". We at Carrot2 have a >> >> strong feeling that clusters should be described properly in order to >> >> be useful, but one may argue that in many, many applications of >> >> clustering, the labels are _not_ important and just individual >> >> features of clusters (like keywords or even documents themselves) are >> >> enough. >> >> >> > >> > >> > >> > -- >> > Ted Dunning, CTO >> > DeepDyve >> > >> > > > > -- > Ted Dunning, CTO > DeepDyve >
