I'll try and help round up some recent survey papers in this area over the next few days.
Here are two (one paper and one slide deck): http://eprints.cs.vt.edu/archive/00001000/01/docclust.pdf http://www.alphaminer.org/document/downloads/TextMining/(ppt)%20Survey%20of%20Text%20Clustering.pdf My biased opinion at this point is that there aren't many new seminal works in text clustering... the hardest problem isn't the clustering algorithm itself.. it's deciding what terms/phrases to cluster with and what not to (feature selection). On Wed, Feb 11, 2009 at 2:13 PM, Grant Ingersoll <[email protected]> wrote: > I've read a number of papers on it, was just looking for items that people > recommend as a way to, potentially, round out my knowledge of the different > approaches. > > I've got the Data Mining book and the Foundations book, so will refresh my > memory on those as well > > > On Feb 11, 2009, at 12:39 PM, Isabel Drost wrote: > >> On Wednesday 11 February 2009, Grant Ingersoll wrote: >>> >>> I'm looking for papers that you recommend on text clustering (I can, >>> of course, go search for them, but I'd like recommendations). New, >>> old, doesn't matter. Either send them here or add them to the wiki at >>> http://cwiki.apache.org/confluence/display/MAHOUT/Reference+Reading >> >> Hmm, I know a few books that also cover the topic of clustering texts - >> maybe >> one of these would be a good starting point. >> >> I like the book "Introduction to Information Retrieval" by Manning, >> Raghavan >> and Schütze. It also contains some chapters on the topic. >> >> "Data Mining" from Witten and Frank has a chapter on the topic. >> >> "Foundations of Statistical Natural Language Processing" has a chapter as >> well. >> >> Are you looking for something in particular? >> >> Isabel >> >> >> -- >> Check it out, send me comments, and dance joyously in the streets, >> -- Linus >> Torvalds announcing 2.0.27 >> |\ _,,,---,,_ Web: <http://www.isabel-drost.de> >> /,`.-'`' -. ;-;;,_ >> |,4- ) )-,_..;\ ( `'-' >> '---''(_/--' `-'\_) (fL) IM: <xmpp://[email protected]> > >
