Could we add all these tricks of the trade to the Moses website, for example, at http://www.statmt.org/moses/?n=Moses.Optimize (also for other topics?). I would really like that ...
Cheers, Jörg On Nov 25, 2014, at 12:45 PM, Holger Schwenk wrote: > Hello, > > another option is to perform data selection to only keep the data relevant to > yout task. > Usually you improve your performance, and as a nice side effect, you LM is > much smaller ;-) > > Many people use the algorithm proposed by Moore and Lewis, which is > implemented in the freely available tool XenC (on github) > > best, > > Holger > > On 11/25/2014 12:02 PM, Hoang Cuong wrote: >> Hi Raj, Tom and Marcin, >> I binarized the ARPA file last night, following your suggestion. In the end, >> it resulted a binarized LM file of roughly 100GB (@Marcin - it is not >> 20-30GB as you suggest, is it okay with this size?) >> Fortunately, the infrastructure at my university allows me to run >> experiments with that. >> Thanks a lot for your help. >> It is so great to play with such huge LMs :)) >> Best, >> >> >> On Mon, Nov 24, 2014 at 3:19 PM, Marcin Junczys-Dowmunt <[email protected]> >> wrote: >> The command >> >> moses/bin/build_binary trie -a 22 -b 8 -q 8 lm.arpa lm.kenlm >> >> will build a compressed binarized model with quantization. You can run >> >> moses/bin/build_binary lm.arpa >> >> without any parameters to get size estimates for different parameter >> settings. I would guess you will get a binarized LM of roughly 20 to 30 GB >> which is managable (provided the size you gave us is that of an uncompressed >> text file). You can also use lmplz to build pruned models in the first >> place, these will be much smaller. >> >> W dniu 2014-11-24 15:11, Tom Hoar napisał(a): >> >>> After binarizing such a large ARPA file with KenLM, you'll need to >>> configure your moses.ini file to "lazily load the model using mmap." This >>> involves using lmodel-file code "9" vs code "8." More details here: >>> https://kheafield.com/code/kenlm/moses/ >>> >>> Performance improves significantly if you store the binarized file on an >>> SSD. >>> >>> >>> >>> >>> On 11/24/2014 07:00 PM, Raj Dabre wrote: >>>> Hey Hoang, >>>> You should binarize the arpa file. >>>> The readme of the LM tool (KenLM or IRSTLM or SRILM) will tell you how. >>>> Regards. >>>> >>>> On Mon, Nov 24, 2014 at 7:07 PM, Hoang Cuong <[email protected]> >>>> wrote: >>>> Hi all, >>>> I have trained an (unpruned) 5-grams language model on a large corpus of 5 >>>> billion words, resulting an ARPA-format file of roughly 300GB (is it a >>>> normal LM size with such a big monolingual data?). This is obviously too >>>> big for running an SMT system. >>>> I read several works where their system uses language models trained on >>>> similar monolingual corpus. Could you give me some advice how to handle >>>> this, making it feasible to run SMT systems? >>>> I appreciate your help a lot, >>>> Best, >>>> -- >>>> Best Regards, >>>> Hoang Cuong >>>> SMTNerd >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>>> >>>> >>>> -- >>>> Raj Dabre. >>>> Research Student, >>>> Graduate School of Informatics, >>>> Kyoto University. >>>> CSE MTech, IITB., 2011-2014 >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> >> -- >> Best Regards, >> Hoang Cuong >> SMTNerd >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
