Hi Philipp,

Thanks for your advice. Maybe I've done something wrong, although I 
followed Moses' documentation guidelines.

First, I compiled separately a new Moses environment '--with-irstlm'.
Next I ran the following in order to have a binarized version of my SRI 
language model:
    $ ./compile-lm corpus.ca.lm ca.blm

Then I updated my moses.ini with the new settings:
    1 0 5 /home/esca/ESCA/lm/ca.blm

At last, I ran moses compiled with irstlm version and I had the 
'segmentation fault' error.


I managed to run the binarized SRI model in the following way:

After 'compile-lm' I updated moses.ini:
    0 0 5 /home/esca/ESCA/lm/ca.blm

And then I ran moses (compiled with SRILM) without any errors.


I thought binarized language models had to be decoded with the IRST 
compiled version of Moses. Am I wrong?

Regards,
   
    Miguel

Philipp Koehn wrote:
> Hi,
>
> To use the binarized IRST LM, you just need to compile the SRILM LM,
> no need to train the model with IRST tools. See Moses documentation
> for details.
>
> -phi
>
> On Tue, Jul 22, 2008 at 12:31 PM, Miguel José Hernández Vidal
> <[EMAIL PROTECTED]> wrote:
>   
>> I've also tried to run moses with a binarized (with compile-lm) SRI
>> language model. When I run the decoder I see a segmentation fault error:
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> [EMAIL PROTECTED]:~$ ~/moses/moses-cmd/src/moses -config 
>> ~/ESCA/model/moses.ini
>> -input-file ~/ESCA/tuning/input > ~/ESCA/evaluation/output
>> Defined parameters (per moses.ini or switch):
>>        config: /home/esca/ESCA/model/moses.ini
>>        distortion-file: 0-0 msd-bidirectional-fe 6
>> /home/esca/ESCA/model/reordering
>>        distortion-limit: 6
>>        input-factors: 0
>>        input-file: /home/esca/ESCA/tuning/input
>>        lmodel-file: 1 0 5 /home/esca/ESCA/lm/ca.blm
>>        mapping: 0 T 0
>>        ttable-file: 0 0 5 /home/esca/ESCA/model/phrase-table
>>        ttable-limit: 20
>>        weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
>>        weight-l: 0.5000
>>        weight-t: 0.2 0.2 0.2 0.2 0.2
>>        weight-w: -1
>> Loading lexical distortion models...
>> have 1 models
>> Creating lexical reordering...
>> weights: 0.300 0.300 0.300 0.300 0.300 0.300
>> binary file loaded, default OFF_T: -1
>> Created lexical orientation reordering
>> Start loading LanguageModel /home/esca/ESCA/lm/ca.blm : [1.000] seconds
>> In LanguageModelIRST::Load: nGramOrder = 5
>> Loading LM file (no MAP)
>> blmt
>> loadbin()
>> loading 321187 1-grams
>> loading 4548952 2-grams
>> loading 2785668 3-grams
>> loading 2501764 4-grams
>> loading 1741048 5-grams
>> done
>> OOV code is 37189
>> IRST: m_unknownId=37189
>> Fallo de segmentación (core dumped) #SEGMENTATION FAULT
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> I am using binarized phrase and reordering tables, but they worked fine
>> when I build them with my old SRILM system.
>>
>> Thanks for your help.
>>
>> Regards,
>>
>>             Miguel
>>
>> Miguel José Hernández Vidal wrote:
>>     
>>> Hi mailing,
>>>
>>> I am trying to build my lm with IRST toolkit. First, I've added <s>
>>> tags with 'add-start-end.sh' and, obviously, have my data tokenized &
>>> lowercased.
>>>
>>> When I run 'build-lm.sh' it looks like it works fine, but at the end
>>> of the process no output file is found. Here's the log:
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> [EMAIL PROTECTED]:~/irstlm/bin$ bash build-lm.sh -i ~/corpus/tag.es -o
>>> ~/corpus/ca.lm -n 3 -k 5 -s kneser-ney
>>> Cleaning temporary directory stat
>>> Extracting dictionary from training corpus
>>> Splitting dictionary into 5 lists
>>> Extracting n-gram statistics for each word list
>>> dict.000
>>> dict.001
>>> dict.002
>>> dict.003
>>> dict.004
>>> Estimating language models for each word list
>>> dict.000
>>> dict.001
>>> dict.002
>>> dict.003
>>> dict.004
>>> Merging language models into /home/esca/corpus/ca.lm
>>> Cleaning temporary directory stat
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> I've tried with different corpus sizes, but it didn't work either.
>>> btw, I am running the scripts under Ubuntu 7.04 32bit.
>>>
>>> Regards,
>>>
>>>                Miguel
>>>
>>>       
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>     
>
>   

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to