The error message means that <unk> will be given log probability 0, thereby favoring unknown words. Many people train closed-vocabulary models and include the target side parallel data. However, pass through from source (a phrase table OOV) may or may not be an OOV wrt language model, so a pass through feature is not equivalent to an OOV feature. Ideally Moses would have an OOV counter feature, thereby allowing MERT to tune the probability of <unk>. Failing that, I think it's generally a good idea to train open vocabulary LMs.
On 02/07/11 03:35, Joerg Tiedemann wrote: > Monday morning ... I used the 32-bit version of build_binary > My fault -- Sorry for the confusion. > > Jörg > > > On Mon, Feb 7, 2011 at 9:17 AM, Joerg Tiedemann <[email protected]> wrote: >> Great - using kenlm seems to work. It looks like <unk> is responsible >> for the trouble. At least that's the only complaint I've seen when >> loading with kenlm. I forgot the '-unk' flag in ngram-count. Is that a >> big problem? I don't want to re-run the lm-training ... >> >> One more thing: kenlm/build_binary crashes (because of the missing <unk>?) >> >> Reading >> /home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.lm >> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 >> Language model is missing <unk>. Substituting probability 0. >> *make: *** >> [/home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.kenlm] >> Segmentation fault >> make: *** Deleting file >> `/home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.kenlm' >> >> Thanks again, >> >> Jörg >> >> >> On Mon, Feb 7, 2011 at 12:08 AM, Kenneth Heafield <[email protected]> >> wrote: >>> The first error you report (body != 0) means malloc returned 0. That's >>> an out of memory condition (or a bug in SRI asking for 0 memory). Are >>> you you compiling 32-bit or running with any other hard limit on RAM? >>> >>> Don't know what your second error is. >>> >>> Try kenlm. It uses less memory and has more informative error messages. >>> >>> Kenneth >>> >>> On 02/06/11 17:57, Joerg Tiedemann wrote: >>>> Hi, >>>> >>>> I have a problem loading LMs generated from the news.shuffled data >>>> sets. The decoder dies with this message: >>>> >>>> Start loading LanguageModel >>>> /home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.lm >>>> : [138.000] seconds >>>> moses: ../../include/LHash.cc:138: void LHash<KeyT, >>>> DataT>::alloc(unsigned int) [with KeyT = unsigned int, DataT = float]: >>>> Assertion `body != 0' failed. >>>> sh: line 1: 1692 Aborted >>>> /home/staff/joerg/projects/LetsMT/tools32/mosesdecoder/moses-cmd/src/moses >>>> -threads 4 -config filtered/moses.ini -inputtype 0 -w -0.217387 -lm >>>> 0.036238 0.036238 0.036238 -d 0.065216 0.065216 0.065216 0.065216 >>>> 0.065216 0.065216 0.065216 -tm 0.043477 0.043477 0.043477 0.043477 >>>> 0.043477 -n-best-list run1.best100.out 100 -input-file >>>> /home/staff/joerg/projects/UUMT/wmt11/data/dev/newstest2009-src.low.en >>>>> run1.out >>>> >>>> >>>> The LM is big but I don't think that memory is the problem. I have >>>> also a similar problem with a smaller Czech LM (but a different >>>> message): >>>> >>>> >>>> Start loading LanguageModel >>>> /home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.cs.lm >>>> : [36.000] seconds >>>> Unexpected error. >>>> sh: line 1: 6737 Aborted >>>> /home/staff/joerg/projects/LetsMT/tools32/mosesdecoder/moses-cmd/src/moses >>>> -threads 4 -config filtered/moses.ini -inputtype 0 -w -0.217387 -lm >>>> 0.036238 0.036238 0.036238 -d 0.065216 0.065216 0.065216 0.065216 >>>> 0.065216 0.065216 0.065216 -tm 0.043477 0.043477 0.043477 0.043477 >>>> 0.043477 -n-best-list run1.best100.out 100 -input-file >>>> /home/staff/joerg/projects/UUMT/wmt11/data/dev/newstest2009-src.low.en >>>>> run1.out >>>> Exit code: 134 >>>> >>>> >>>> Any ideas? >>>> Thanks, >>>> >>>> Jörg >>>> >>>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >> >> >> >> -- >> ********************************************************************************** >> Jörg Tiedemann >> http://stp.lingfil.uu.se/~joerg/ >> > > > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
