I am struggling with a pipeline .....
Here is the text1.txt file I would like to translate from FR to EN
<g id="1">Les banques de la zone euro sont soumises :</g>
<g id="1">au ratio de capital lié à la détention d’actifs risqués (nous
nous intéressons ici au crédit) ;</g>
<g id="1">au ratio de levier, qui détermine le capital règlementaire à
partir de la taille du bilan de la banque ;</g>
<g id="1">au ratio de liquidité, qui impose aux banques de détenir en
particulier des portefeuilles importants de titres publics.</g>
I am running the following properly :
/home/moses/mosesdecoder/scripts/tokenizer/normalize-punctuation.perl fr
< text1.txt > text2.txt
/home/moses/matecat/matecat_util/code/tokenizer/deescape-special-chars.perl
< text2.txt > text3.txt
/home/moses/matecat/matecat_util/code/tokenizer/tokenizer.perl -X -a -l
fr < text3.txt > text4.txt
/home/moses/mosesdecoder/scripts/recaser/truecase.perl --model
/home/moses/working/truecaser/truecase-model.1.fr < text4.txt > text5.txt
/home/moses/mosesdecoder/bin/moses -f
/home/moses/working/tuning/moses.tuned.ini.1 < text5.txt > text6.txt
then in my text6.txt I have
<g id="1"> banks in the euro zone are subject :</g>
<g id="1"> ratio of capital linked to the detention of risky assets ( we
are here to credit ;</g> )
<g id="1"> the leverage ratio , which determines the regulatory capital
from the size of the balance sheet of the bank ;</g>
<g id="1"> ratio of liquidity , which requires banks to hold especially
important portfolios of securities .</g> public
but then neither the detokenizer nor the detruecaser will give me the
correct output.
"banks" will not get the uppercase B
I also tried to look at this
https://github.com/christianbuck/matecat_util/tree/master/python_server
or this
https://github.com/christianbuck/matecat_util/tree/master/code/tags4moses
but no luck.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support