Hi,
There are more n-grams. I'm guessing you're running low on RAM. Are
you referring to estimating or querying?
To estimate such a model from data, you simply need to use the -o
option to lmplz as you already do. lmplz already lets you specify the
memory usage.
For models above 7, to query you will need to recompile with e.g.
--max-kenlm=order=8 .
Regarding compression, take a look at
http://kheafield.com/code/kenlm/structures/
This all said, I doubt you'll get much useful out of a 500 MB data set
with higher orders.
Kenneth
On 05/18/2015 09:48 PM, koormoosh wrote:
> Hello,
>
> I wonder why it takes lot of time to do language modelling with kenlm
> and srilm when n goes beyond 6 (even on a relatively small dataset: 500
> MB), and is there a way to actually do high-order (6,7,8-gram) language
> modelling with srilm and kenlm on a laptop (12GB RAM)? I assume there is
> a flag somewhere that I need to set when creating the arpa or binary
> file, or during the test (computing the perplexity etc...).
>
> Thanks,
> -K
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support