Hi,
I am trying to build a language model large amount text (13GB). In the step
of converting iARPA format to ARPA format i met following error:
/tools/irstlm-5.22.01/bin/compile-lm wiki.it.truecase.ilm.gz --text yes
wiki.it.lm
inpfile: wiki.it.truecase.ilm.gz
dub: 1000
Reading
Hello Zahurul,
Have you tried the latest release of IRSTLM? It is currently at 5.40.01
which is available from here:
http://hlt.fbk.eu/en/irstlm
Updates since 5.22 are:
B.10 Version 5.30
*Support for a safe management of LMs with a total amount of n-grams larger
than 250 million*
Use of a new
this means you have run out of memory.
you can either:
--get more memory
--use less data
--use a lower-order LM
--use RandLM, which can easily handle this amount of data (i am
currently building LMs using more than 30 billion words with it for
example)
Miles
On 21 April 2010 09:57, Zahurul
Dear Zahurul
the newest release of IRSTLM (5.40.01) should solve your problem which is
probably related to the size.
Please download from here:
http://hlt.fbk.eu/en/irstlm
There is an official mailing list for IRSTLM, you can join from here
https://list.fbk.eu/sympa/subscribe/user-irstlm
The