On Apr 30, 2010, at 1:15 PM, Robin Anil wrote: > On Fri, Apr 30, 2010 at 10:40 PM, Bogdan Vatkov > <bogdan.vat...@gmail.com>wrote: > >> Hi Grant, >> >> You are probably right. >> What I wanted is to use my mahout setup to extract topics from a single >> document. >> So, maybe in popular terms I am trying to do topic extraction via document >> clustering. >> Does it make sense to try to split a doc into sub docs so that I leverage >> the clustering algorithm and thus find topic which appear key ones for the >> document? >> > Have you heard of LDA (Its in Mahout). Or are you trying to do something > different for topic extraction ?
That's more across docs. You might also have a look at TextRank, which is a graph based approach to keyword/topic extraction that is nice to implement (one of these days, I'll do it in Mahout)