Thanks for suggestion. Can you give me a specific NLT toolset/approach with example if you have experience already?
On Tue, Jun 14, 2011 at 12:57 PM, Venkatraman S <venka...@gmail.com> wrote: > On Tue, Jun 14, 2011 at 12:07 PM, Gopalakrishnan Subramani < > gopalakrishnan.subram...@gmail.com> wrote: > > > Jayalalithaa meets PM, DMK watches closely > > Jaya to meet PM today in New Delhi > > Jaya-PM meet, 'jittery' DMK watches on Times > > > > How to do this in Python? I think, NLT toolkit is too large for me to > learn > > and do.. Any other fun & simpler way to do that? > > > > 1) NLTK is pretty simple. You can do duplicate detection pretty easily - > look out for sample codes. > > 2) Do a keyword generation from the content and check the correlation > between documents. > > 3) For headlines alone : do a substring matching?(but this would leave the > semantics of the text - i.e, 'Jayalalitha was last seen in KOdagu estate' > and 'Real estate would get a boost under Jayalalitha' would be categorized > under the same) > > -V > http://blizzardzblogs.blogspot.com/ > _______________________________________________ > BangPypers mailing list > BangPypers@python.org > http://mail.python.org/mailman/listinfo/bangpypers > _______________________________________________ BangPypers mailing list BangPypers@python.org http://mail.python.org/mailman/listinfo/bangpypers