Re: Collocations in Mahout?

Ted Dunning Tue, 05 Jan 2010 11:59:07 -0800

We do have partial framework for this including log-likelihood ratio test
computation.

For the most part, we don't have anything that specifically counts bigrams
and words and arranges the counts in the right order for application, but
that is relatively easy to write for map-reduce.

I would be happy to provide pointers on the tricks I have seen to make that
easy to do if you wanted to actually type the semi-colons and such.

On Tue, Jan 5, 2010 at 9:02 AM, zaki rahaman <[email protected]> wrote:

> Pardon my ignorance as this is probably best handled by an NLP package like
> GATE or LingPipe, but does Mahout provide anything for collocations? Or
> does
> anyone know of a MapReducible way to calculate something like t-values for
> tokens in N-grams? I've got quite a large collection that I have to prune,
> filter, and preprocess, but I still expect it to be a significant size.
>
> --
> Zaki Rahaman
>

-- 
Ted Dunning, CTO
DeepDyve

Re: Collocations in Mahout?

Reply via email to