Sami, do you uses the whole set available at http://people.csail.mit.edu/people/koehn/publications/europarl/ , or just some parts of text to build the profiles? (If I correctly remember my previous works on ngrams, just a few Mo are necessary to have a representative set of 3-grams).

I used a relative small subset - just a few MB to build the profiles.

The Gutenberg project is a nice page to find sources that can be used to generate ngrams for a set of languages.
http://www.gutenberg.org/





Reply via email to