Jens Grivolla a écrit : > Hello, > > I'm looking to extract significant terms characterizing a set of > documents (which in turn relate to a topic). > > This basically comes down to functionality similar to determining the > terms with the greatest offer weight (as used for blind relevance > feedback), or maximizing tf.idf (as is done in MoreLikeThis). > > Is there anything like this already implemented, or do I need to > iterate through all documents in the set "manually", re-tokenize each > one (or maybe use TermVectors), and then calculate the weight for each > term? http://project.carrot2.org/index.html may be your friend.
M. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]