Hi Philipp,
Thanks for your advice. Maybe I've done something wrong, although I
followed Moses' documentation guidelines.
First, I compiled separately a new Moses environment '--with-irstlm'.
Next I ran the following in order to have a binarized version of my SRI
language model:
$ ./compile-lm corpus.ca.lm ca.blm
Then I updated my moses.ini with the new settings:
1 0 5 /home/esca/ESCA/lm/ca.blm
At last, I ran moses compiled with irstlm version and I had the
'segmentation fault' error.
I managed to run the binarized SRI model in the following way:
After 'compile-lm' I updated moses.ini:
0 0 5 /home/esca/ESCA/lm/ca.blm
And then I ran moses (compiled with SRILM) without any errors.
I thought binarized language models had to be decoded with the IRST
compiled version of Moses. Am I wrong?
Regards,
Miguel
Philipp Koehn wrote:
> Hi,
>
> To use the binarized IRST LM, you just need to compile the SRILM LM,
> no need to train the model with IRST tools. See Moses documentation
> for details.
>
> -phi
>
> On Tue, Jul 22, 2008 at 12:31 PM, Miguel José Hernández Vidal
> <[EMAIL PROTECTED]> wrote:
>
>> I've also tried to run moses with a binarized (with compile-lm) SRI
>> language model. When I run the decoder I see a segmentation fault error:
>>
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> [EMAIL PROTECTED]:~$ ~/moses/moses-cmd/src/moses -config
>> ~/ESCA/model/moses.ini
>> -input-file ~/ESCA/tuning/input > ~/ESCA/evaluation/output
>> Defined parameters (per moses.ini or switch):
>> config: /home/esca/ESCA/model/moses.ini
>> distortion-file: 0-0 msd-bidirectional-fe 6
>> /home/esca/ESCA/model/reordering
>> distortion-limit: 6
>> input-factors: 0
>> input-file: /home/esca/ESCA/tuning/input
>> lmodel-file: 1 0 5 /home/esca/ESCA/lm/ca.blm
>> mapping: 0 T 0
>> ttable-file: 0 0 5 /home/esca/ESCA/model/phrase-table
>> ttable-limit: 20
>> weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
>> weight-l: 0.5000
>> weight-t: 0.2 0.2 0.2 0.2 0.2
>> weight-w: -1
>> Loading lexical distortion models...
>> have 1 models
>> Creating lexical reordering...
>> weights: 0.300 0.300 0.300 0.300 0.300 0.300
>> binary file loaded, default OFF_T: -1
>> Created lexical orientation reordering
>> Start loading LanguageModel /home/esca/ESCA/lm/ca.blm : [1.000] seconds
>> In LanguageModelIRST::Load: nGramOrder = 5
>> Loading LM file (no MAP)
>> blmt
>> loadbin()
>> loading 321187 1-grams
>> loading 4548952 2-grams
>> loading 2785668 3-grams
>> loading 2501764 4-grams
>> loading 1741048 5-grams
>> done
>> OOV code is 37189
>> IRST: m_unknownId=37189
>> Fallo de segmentación (core dumped) #SEGMENTATION FAULT
>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>> I am using binarized phrase and reordering tables, but they worked fine
>> when I build them with my old SRILM system.
>>
>> Thanks for your help.
>>
>> Regards,
>>
>> Miguel
>>
>> Miguel José Hernández Vidal wrote:
>>
>>> Hi mailing,
>>>
>>> I am trying to build my lm with IRST toolkit. First, I've added <s>
>>> tags with 'add-start-end.sh' and, obviously, have my data tokenized &
>>> lowercased.
>>>
>>> When I run 'build-lm.sh' it looks like it works fine, but at the end
>>> of the process no output file is found. Here's the log:
>>>
>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> [EMAIL PROTECTED]:~/irstlm/bin$ bash build-lm.sh -i ~/corpus/tag.es -o
>>> ~/corpus/ca.lm -n 3 -k 5 -s kneser-ney
>>> Cleaning temporary directory stat
>>> Extracting dictionary from training corpus
>>> Splitting dictionary into 5 lists
>>> Extracting n-gram statistics for each word list
>>> dict.000
>>> dict.001
>>> dict.002
>>> dict.003
>>> dict.004
>>> Estimating language models for each word list
>>> dict.000
>>> dict.001
>>> dict.002
>>> dict.003
>>> dict.004
>>> Merging language models into /home/esca/corpus/ca.lm
>>> Cleaning temporary directory stat
>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>>
>>> I've tried with different corpus sizes, but it didn't work either.
>>> btw, I am running the scripts under Ubuntu 7.04 32bit.
>>>
>>> Regards,
>>>
>>> Miguel
>>>
>>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>>
>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support