The error message means that <unk> will be given log probability 0,
thereby favoring unknown words.  Many people train closed-vocabulary
models and include the target side parallel data.  However, pass through
from source (a phrase table OOV) may or may not be an OOV wrt language
model, so a pass through feature is not equivalent to an OOV feature.
Ideally Moses would have an OOV counter feature, thereby allowing MERT
to tune the probability of <unk>.  Failing that, I think it's generally
a good idea to train open vocabulary LMs.

On 02/07/11 03:35, Joerg Tiedemann wrote:
> Monday morning ... I used the 32-bit version of build_binary
> My fault -- Sorry for the confusion.
> 
> Jörg
> 
> 
> On Mon, Feb 7, 2011 at 9:17 AM, Joerg Tiedemann <[email protected]> wrote:
>> Great - using kenlm seems to work. It looks like <unk> is responsible
>> for the trouble. At least that's the only complaint I've seen when
>> loading with kenlm. I forgot the '-unk' flag in ngram-count. Is that a
>> big problem? I don't want to re-run the lm-training ...
>>
>> One more thing: kenlm/build_binary crashes (because of the missing <unk>?)
>>
>> Reading 
>> /home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.lm
>> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
>> Language model is missing <unk>.  Substituting probability 0.
>> *make: *** 
>> [/home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.kenlm]
>> Segmentation fault
>> make: *** Deleting file
>> `/home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.kenlm'
>>
>> Thanks again,
>>
>> Jörg
>>
>>
>> On Mon, Feb 7, 2011 at 12:08 AM, Kenneth Heafield <[email protected]> 
>> wrote:
>>> The first error you report (body != 0) means malloc returned 0.  That's
>>> an out of memory condition (or a bug in SRI asking for 0 memory).  Are
>>> you you compiling 32-bit or running with any other hard limit on RAM?
>>>
>>> Don't know what your second error is.
>>>
>>> Try kenlm.  It uses less memory and has more informative error messages.
>>>
>>> Kenneth
>>>
>>> On 02/06/11 17:57, Joerg Tiedemann wrote:
>>>> Hi,
>>>>
>>>> I have a problem loading LMs generated from the news.shuffled data
>>>> sets. The decoder dies with this message:
>>>>
>>>> Start loading LanguageModel
>>>> /home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.de.lm
>>>> : [138.000] seconds
>>>> moses: ../../include/LHash.cc:138: void LHash<KeyT,
>>>> DataT>::alloc(unsigned int) [with KeyT = unsigned int, DataT = float]:
>>>> Assertion `body != 0' failed.
>>>> sh: line 1:  1692 Aborted
>>>> /home/staff/joerg/projects/LetsMT/tools32/mosesdecoder/moses-cmd/src/moses
>>>> -threads 4 -config filtered/moses.ini -inputtype 0 -w -0.217387 -lm
>>>> 0.036238 0.036238 0.036238 -d 0.065216 0.065216 0.065216 0.065216
>>>> 0.065216 0.065216 0.065216 -tm 0.043477 0.043477 0.043477 0.043477
>>>> 0.043477 -n-best-list run1.best100.out 100 -input-file
>>>> /home/staff/joerg/projects/UUMT/wmt11/data/dev/newstest2009-src.low.en
>>>>> run1.out
>>>>
>>>>
>>>> The LM is big but I don't think that memory is the problem. I have
>>>> also a similar problem with a smaller Czech LM (but a different
>>>> message):
>>>>
>>>>
>>>> Start loading LanguageModel
>>>> /home/staff/joerg/projects/UUMT/wmt11/data/training-monolingual/news.shuffled.low.cs.lm
>>>> : [36.000] seconds
>>>> Unexpected error.
>>>> sh: line 1:  6737 Aborted
>>>> /home/staff/joerg/projects/LetsMT/tools32/mosesdecoder/moses-cmd/src/moses
>>>> -threads 4 -config filtered/moses.ini -inputtype 0 -w -0.217387 -lm
>>>> 0.036238 0.036238 0.036238 -d 0.065216 0.065216 0.065216 0.065216
>>>> 0.065216 0.065216 0.065216 -tm 0.043477 0.043477 0.043477 0.043477
>>>> 0.043477 -n-best-list run1.best100.out 100 -input-file
>>>> /home/staff/joerg/projects/UUMT/wmt11/data/dev/newstest2009-src.low.en
>>>>> run1.out
>>>> Exit code: 134
>>>>
>>>>
>>>> Any ideas?
>>>> Thanks,
>>>>
>>>> Jörg
>>>>
>>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>>
>> --
>> **********************************************************************************
>> Jörg Tiedemann                                 
>> http://stp.lingfil.uu.se/~joerg/
>>
> 
> 
> 
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to