On Apr 30, 2010, at 1:15 PM, Robin Anil wrote:

> On Fri, Apr 30, 2010 at 10:40 PM, Bogdan Vatkov 
> <bogdan.vat...@gmail.com>wrote:
> 
>> Hi Grant,
>> 
>> You are probably right.
>> What I wanted is to use my mahout setup to extract topics from a single
>> document.
>> So, maybe in popular terms I am trying to do topic extraction via document
>> clustering.
>> Does it make sense to try to split a doc into sub docs so that I leverage
>> the clustering algorithm and thus find topic which appear key ones for the
>> document?
>> 
> Have you heard of LDA (Its in Mahout). Or are you trying to do something
> different for topic extraction ?

That's more across docs.  You might also have a look at TextRank, which is a 
graph based approach to keyword/topic extraction that is nice to implement (one 
of these days, I'll do it in Mahout)

Reply via email to