I am struggling with a pipeline .....

Here is the text1.txt file I would like to translate from FR to EN
<g id="1">Les banques de la zone euro sont soumises :</g>
<g id="1">au ratio de capital lié à la détention d’actifs risqués (nous nous intéressons ici au crédit) ;</g> <g id="1">au ratio de levier, qui détermine le capital règlementaire à partir de la taille du bilan de la banque ;</g> <g id="1">au ratio de liquidité, qui impose aux banques de détenir en particulier des portefeuilles importants de titres publics.</g>

I am running the following properly :

/home/moses/mosesdecoder/scripts/tokenizer/normalize-punctuation.perl fr < text1.txt > text2.txt /home/moses/matecat/matecat_util/code/tokenizer/deescape-special-chars.perl < text2.txt > text3.txt /home/moses/matecat/matecat_util/code/tokenizer/tokenizer.perl -X -a -l fr < text3.txt > text4.txt /home/moses/mosesdecoder/scripts/recaser/truecase.perl --model /home/moses/working/truecaser/truecase-model.1.fr < text4.txt > text5.txt /home/moses/mosesdecoder/bin/moses -f /home/moses/working/tuning/moses.tuned.ini.1 < text5.txt > text6.txt

then in my text6.txt I have

<g id="1"> banks in the euro zone are subject :</g>
<g id="1"> ratio of capital linked to the detention of risky assets ( we are here to credit ;</g> ) <g id="1"> the leverage ratio , which determines the regulatory capital from the size of the balance sheet of the bank ;</g> <g id="1"> ratio of liquidity , which requires banks to hold especially important portfolios of securities .</g> public

but then neither the detokenizer nor the detruecaser will give me the correct output.
"banks" will not get the uppercase B


I also tried to look at this https://github.com/christianbuck/matecat_util/tree/master/python_server or this https://github.com/christianbuck/matecat_util/tree/master/code/tags4moses

but no luck.


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to