Hi,

this is very weird. You are using the 'irstlm/src/compile-lm' command, are you?
I was first a bit confused (actually still am), because there is also
a SRILM binary
format.

-phi

On Wed, Jul 23, 2008 at 10:50 AM, Miguel José Hernández Vidal
<[EMAIL PROTECTED]> wrote:
> Hi Philipp,
>
> Thanks for your advice. Maybe I've done something wrong, although I followed
> Moses' documentation guidelines.
>
> First, I compiled separately a new Moses environment '--with-irstlm'.
> Next I ran the following in order to have a binarized version of my SRI
> language model:
>   $ ./compile-lm corpus.ca.lm ca.blm
>
> Then I updated my moses.ini with the new settings:
>   1 0 5 /home/esca/ESCA/lm/ca.blm
>
> At last, I ran moses compiled with irstlm version and I had the
> 'segmentation fault' error.
>
>
> I managed to run the binarized SRI model in the following way:
>
> After 'compile-lm' I updated moses.ini:
>   0 0 5 /home/esca/ESCA/lm/ca.blm
>
> And then I ran moses (compiled with SRILM) without any errors.
>
>
> I thought binarized language models had to be decoded with the IRST compiled
> version of Moses. Am I wrong?
>
> Regards,
>     Miguel
>
> Philipp Koehn wrote:
>>
>> Hi,
>>
>> To use the binarized IRST LM, you just need to compile the SRILM LM,
>> no need to train the model with IRST tools. See Moses documentation
>> for details.
>>
>> -phi
>>
>> On Tue, Jul 22, 2008 at 12:31 PM, Miguel José Hernández Vidal
>> <[EMAIL PROTECTED]> wrote:
>>
>>>
>>> I've also tried to run moses with a binarized (with compile-lm) SRI
>>> language model. When I run the decoder I see a segmentation fault error:
>>>
>>>
>>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> [EMAIL PROTECTED]:~$ ~/moses/moses-cmd/src/moses -config 
>>> ~/ESCA/model/moses.ini
>>> -input-file ~/ESCA/tuning/input > ~/ESCA/evaluation/output
>>> Defined parameters (per moses.ini or switch):
>>>       config: /home/esca/ESCA/model/moses.ini
>>>       distortion-file: 0-0 msd-bidirectional-fe 6
>>> /home/esca/ESCA/model/reordering
>>>       distortion-limit: 6
>>>       input-factors: 0
>>>       input-file: /home/esca/ESCA/tuning/input
>>>       lmodel-file: 1 0 5 /home/esca/ESCA/lm/ca.blm
>>>       mapping: 0 T 0
>>>       ttable-file: 0 0 5 /home/esca/ESCA/model/phrase-table
>>>       ttable-limit: 20
>>>       weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
>>>       weight-l: 0.5000
>>>       weight-t: 0.2 0.2 0.2 0.2 0.2
>>>       weight-w: -1
>>> Loading lexical distortion models...
>>> have 1 models
>>> Creating lexical reordering...
>>> weights: 0.300 0.300 0.300 0.300 0.300 0.300
>>> binary file loaded, default OFF_T: -1
>>> Created lexical orientation reordering
>>> Start loading LanguageModel /home/esca/ESCA/lm/ca.blm : [1.000] seconds
>>> In LanguageModelIRST::Load: nGramOrder = 5
>>> Loading LM file (no MAP)
>>> blmt
>>> loadbin()
>>> loading 321187 1-grams
>>> loading 4548952 2-grams
>>> loading 2785668 3-grams
>>> loading 2501764 4-grams
>>> loading 1741048 5-grams
>>> done
>>> OOV code is 37189
>>> IRST: m_unknownId=37189
>>> Fallo de segmentación (core dumped) #SEGMENTATION FAULT
>>>
>>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>
>>> I am using binarized phrase and reordering tables, but they worked fine
>>> when I build them with my old SRILM system.
>>>
>>> Thanks for your help.
>>>
>>> Regards,
>>>
>>>            Miguel
>>>
>>> Miguel José Hernández Vidal wrote:
>>>
>>>>
>>>> Hi mailing,
>>>>
>>>> I am trying to build my lm with IRST toolkit. First, I've added <s>
>>>> tags with 'add-start-end.sh' and, obviously, have my data tokenized &
>>>> lowercased.
>>>>
>>>> When I run 'build-lm.sh' it looks like it works fine, but at the end
>>>> of the process no output file is found. Here's the log:
>>>>
>>>>
>>>> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>> [EMAIL PROTECTED]:~/irstlm/bin$ bash build-lm.sh -i ~/corpus/tag.es -o
>>>> ~/corpus/ca.lm -n 3 -k 5 -s kneser-ney
>>>> Cleaning temporary directory stat
>>>> Extracting dictionary from training corpus
>>>> Splitting dictionary into 5 lists
>>>> Extracting n-gram statistics for each word list
>>>> dict.000
>>>> dict.001
>>>> dict.002
>>>> dict.003
>>>> dict.004
>>>> Estimating language models for each word list
>>>> dict.000
>>>> dict.001
>>>> dict.002
>>>> dict.003
>>>> dict.004
>>>> Merging language models into /home/esca/corpus/ca.lm
>>>> Cleaning temporary directory stat
>>>>
>>>> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>>>
>>>>
>>>> I've tried with different corpus sizes, but it didn't work either.
>>>> btw, I am running the scripts under Ubuntu 7.04 32bit.
>>>>
>>>> Regards,
>>>>
>>>>               Miguel
>>>>
>>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>>
>>
>>
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to