Thanks Ken for all your feedback, One more question. I'm using moses with boost. I uncommented the line #define USE_BOOST in kenlm/util/string_piece.hh and recompiled Moses without problems.
Then, I uncommented #define USE_ICU and ./configure fails with the error log below. libicu-dev and libicu42 are is loaded on my system. Also, each compile started with a clean moses download. Is USE_ICU usable or necessary with Moses? Thanks, Tom configure: Using Boost library checking for boostlib >= 1.36.0... yes configure: Building threaded moses checking whether the Boost::Thread library is available... yes checking for exit in -lboost_thread-mt... yes checking Ngram.h usability... yes checking Ngram.h presence... yes checking for Ngram.h... yes checking for trigram_init in -loolm... yes checking n_gram.h usability... yes checking n_gram.h presence... yes checking for n_gram.h... yes checking lm/ngram.hh usability... no checking lm/ngram.hh presence... yes checking for lm/ngram.hh... no configure: WARNING: lm/ngram.hh: present but cannot be compiled configure: WARNING: lm/ngram.hh: check for missing prerequisite headers? configure: WARNING: lm/ngram.hh: see the Autoconf documentation configure: WARNING: lm/ngram.hh: section "Present But Cannot Be Compiled" configure: WARNING: lm/ngram.hh: proceeding with the compiler's result configure: error: Cannot find KEN-LM in yes On Tue, 26 Oct 2010 12:48:13 -0400, Kenneth Heafield <mo...@kheafield.com> wrote: > Yes, I require <s> and </s> to appear in your ARPA. These tags are > important from an output quality perspective (BLEU etc). I'll put that > in the documentation when I get around to writing it, but personally > think IRST should include them by default. > > Kenneth > > On 10/26/10 12:30, supp...@precisiontranslationtools.com wrote: >> Thanks Ken. I tested it and it works. >> >> FYI, on my first attempt there was a different error. Something about the >> <s> token (word?) was missing. I added the <s></s> tags and re-ran >> irstlm's >> build-lm.sh script with option -b (Include sentence boundary n-grams) and >> the error disappeared. >> >> It's pretty fast now. I look forward to testing the optimized code. >> >> Tom >> >> >> >> On Tue, 26 Oct 2010 10:18:17 -0400, Kenneth Heafield >> <mo...@kheafield.com> >> wrote: >>> I've fixed this in revision 3657 and tested that it works with a toy >>> IRSTLM example. >>> >>> Sorry about that, >>> >>> Kenneth >>> >>> P.S. a faster version is under code review and coming soon. >>> >>> On 10/26/10 03:57, Nicola Bertoldi wrote: >>>> the empty line after each ngram-block is not mandatory in the ARPA >> format >>>> (see >>>> http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html) >>>> and IRSTLM does not produce it. >>>> >>>> >>>> best regards, >>>> Nicola Bertoldi >>>> >>>> On Oct 26, 2010, at 9:42 AM, <supp...@precisiontranslationtools.com> >>>> <supp...@precisiontranslationtools.com> wrote: >>>> >>>>> Hi Ken, >>>>> >>>>> I'm created an iARPA file with IRSTLM using the options -n 3 (2 >>>>> grams), -b >>>>> (include the <s> sentence boundary) and -d (subdictionary for ngrams). >>>>> Then, I used IRSTLM's compile-lm with --text yes to convert to ARPA >>>>> format. >>>>> >>>>> Finally, I ran build_binary to binarize the ARPA format for KenLM. I >> got >>>>> the following error: >>>>> >>>>> $ build_binary arpa.en.lm arpa.en.binary >>>>> Reading lm.en.lm >>>>> >> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >>>>> >>>>> terminate called after throwing an instance of >> 'lm::FormatLoadException' >>>>> what(): Expected blank line after 3-grams at byte 22348989 in file >>>>> arpa.en.lm >>>>> Aborted >>>>> >>>>> What am I missing? >>>>> >>>>> Thanks, >>>>> Tom >>>>> >>>>> >>>>> On Fri, 22 Oct 2010 10:15:21 -0400, Kenneth Heafield >>>>> <mo...@kheafield.com> >>>>> wrote: >>>>>> KenLM is inference-only. It cannot create ARPA files. So you'll >> need >>>>>> to use your favorite toolkit to generate the ARPA. >>>>>> >>>>>> On 10/22/10 07:52, supp...@precisiontranslationtools.com wrote: >>>>>>> Thanks Ken. Nice work. >>>>>>> >>>>>>> Is there a way to train the ARPA formatted LM with KenLM, or do we >>>>>>> need >>>>>>> to >>>>>>> train with another tool, like SRILM or convert IRSTLM to full ARPA >>>>>>> format? >>>>>>> >>>>>>> Thanks again, >>>>>>> Tom >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Mon, 18 Oct 2010 20:31:38 -0400, Kenneth Heafield >>>>>>> <mo...@kheafield.com> >>>>>>> wrote: >>>>>>>> Hi Moses, >>>>>>>> >>>>>>>> Introducing kenlm in Moses trunk. You no longer need to >>>>>>>> download a >>>>>>>> separate language model to use Moses; it's distributed with Moses >> and >>>>>>>> compiled in by default on UNIX. This is threadsafe language model >>>>>>>> inference code that returns the same probabilities as SRI (up to >>>>>>>> floating point rounding). It loads APRA files in 2/3 the time SRI >>>>> takes >>>>>>>> and uses less memory too. Using kenlm is simple: in your >>>>> [lmodel-file] >>>>>>>> section, change the first digit to 8. For example, >>>>>>>> >>>>>>>> "0 0 2 foo.arpa" changes to "8 0 2 foo.arpa" >>>>>>>> >>>>>>>> For even faster loading, use the binary format: >>>>>>>> >>>>>>>> kenlm/build_binary foo.arpa foo.binary >>>>>>>> >>>>>>>> then simply provide the binary filename in your moses.ini e.g. >>>>>>>> "8 0 2 foo.binary"; it auto detects binary files using magic bytes >> at >>>>>>>> the beginning. >>>>>>>> >>>>>>>> The code is ready for use and provides correct results. >>>>>>>> Inference is >>>>>>>> slower than it should be due to inefficiencies in the Moses-side >>>>> wrapper >>>>>>>> code (it does a vocab lookup for all 5 words every time). I'm >>>>>>>> working >>>>>>>> on it and once this is done I'll post some benchmarks against SRI >> and >>>>>>>> IRST. The binary format is subject to change, but contains a >> version >>>>>>>> number so on very rare occasions after, new versions will tell you >> to >>>>>>>> rebuild your binary files. Windows is currently not supported (it >>>>> uses >>>>>>>> mmap) though I welcome contributions using #ifdef and >>>>> CreateFileMapping. >>>>>>>> >>>>>>>> Have fun and let me know about your experiences with it. >>>>>>>> >>>>>>>> "Ken" >>>>>>>> _______________________________________________ >>>>>>>> Moses-support mailing list >>>>>>>> Moses-support@mit.edu >>>>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support