Dear all, I encountered a problem when training a recaser. When launching the command
./mosesdecoder/scripts/recaser/train-recaser.perl --first-step 3 --dir model --corpus corpus.en --train-script ./mosesdecoder/scripts/training/train-model.perl the phrase-table ends up having several seemingly identical translation options: naţională ||| Naţională ||| 1 ||| 0-0 ||| 30 30 ||| 30 ||| naţională ||| Naţională ||| 1 ||| 0-0 ||| 36 36 ||| 36 ||| naţională ||| Naţională ||| 1 ||| 0-0 ||| 39 39 ||| 39 ||| naţională ||| Naţională ||| 1 ||| 0-0 ||| 4 4 ||| 4 ||| and a segmentation fault occurs when compressing to compact table using the processPhraseTableMin executable. Could that be due to a missing encoding normalization step somewhere in the procedure? Using a previous version of Moses, the same command above yields just the line naţională ||| Naţională ||| 1 1 1 1 ||| 0-0 ||| 109 109 109 ||| ||| Thanks, Vito Mandorino -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89* *Email :* *[email protected] <[email protected]>* *Website :* *www.linguacustodia.finance <http://www.linguacustodia.com/>*
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
