Pardon my ignorance as this is probably best handled by an NLP package like
GATE or LingPipe, but does Mahout provide anything for collocations? Or does
anyone know of a MapReducible way to calculate something like t-values for
tokens in N-grams? I've got quite a large collection that I have to prune,
filter, and preprocess, but I still expect it to be a significant size.

-- 
Zaki Rahaman

Reply via email to