I've also tried to run moses with a binarized (with compile-lm) SRI
language model. When I run the decoder I see a segmentation fault error:
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[EMAIL PROTECTED]:~$ ~/moses/moses-cmd/src/moses -config ~/ESCA/model/moses.ini
-input-file ~/ESCA/tuning/input > ~/ESCA/evaluation/output
Defined parameters (per moses.ini or switch):
config: /home/esca/ESCA/model/moses.ini
distortion-file: 0-0 msd-bidirectional-fe 6
/home/esca/ESCA/model/reordering
distortion-limit: 6
input-factors: 0
input-file: /home/esca/ESCA/tuning/input
lmodel-file: 1 0 5 /home/esca/ESCA/lm/ca.blm
mapping: 0 T 0
ttable-file: 0 0 5 /home/esca/ESCA/model/phrase-table
ttable-limit: 20
weight-d: 0.3 0.3 0.3 0.3 0.3 0.3 0.3
weight-l: 0.5000
weight-t: 0.2 0.2 0.2 0.2 0.2
weight-w: -1
Loading lexical distortion models...
have 1 models
Creating lexical reordering...
weights: 0.300 0.300 0.300 0.300 0.300 0.300
binary file loaded, default OFF_T: -1
Created lexical orientation reordering
Start loading LanguageModel /home/esca/ESCA/lm/ca.blm : [1.000] seconds
In LanguageModelIRST::Load: nGramOrder = 5
Loading LM file (no MAP)
blmt
loadbin()
loading 321187 1-grams
loading 4548952 2-grams
loading 2785668 3-grams
loading 2501764 4-grams
loading 1741048 5-grams
done
OOV code is 37189
IRST: m_unknownId=37189
Fallo de segmentación (core dumped) #SEGMENTATION FAULT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
I am using binarized phrase and reordering tables, but they worked fine
when I build them with my old SRILM system.
Thanks for your help.
Regards,
Miguel
Miguel José Hernández Vidal wrote:
> Hi mailing,
>
> I am trying to build my lm with IRST toolkit. First, I've added <s>
> tags with 'add-start-end.sh' and, obviously, have my data tokenized &
> lowercased.
>
> When I run 'build-lm.sh' it looks like it works fine, but at the end
> of the process no output file is found. Here's the log:
>
> -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> [EMAIL PROTECTED]:~/irstlm/bin$ bash build-lm.sh -i ~/corpus/tag.es -o
> ~/corpus/ca.lm -n 3 -k 5 -s kneser-ney
> Cleaning temporary directory stat
> Extracting dictionary from training corpus
> Splitting dictionary into 5 lists
> Extracting n-gram statistics for each word list
> dict.000
> dict.001
> dict.002
> dict.003
> dict.004
> Estimating language models for each word list
> dict.000
> dict.001
> dict.002
> dict.003
> dict.004
> Merging language models into /home/esca/corpus/ca.lm
> Cleaning temporary directory stat
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
> I've tried with different corpus sizes, but it didn't work either.
> btw, I am running the scripts under Ubuntu 7.04 32bit.
>
> Regards,
>
> Miguel
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support