Hello,
I would like to start using factored (POS-tagged) models instead of
unfactored ones, so I tried to follow the tutorial at
http://www.statmt.org/moses/?n=Moses.FactoredTutorial.
I downloaded the sample factored-corpus but instead of using the
provided SRILM language models, I want to use KenLM, so I proceeded
in the following way:
lmplz -o 5 < ../factored-corpus/proj-syndicate.1000.en > ../factored-
corpus/kenlm/proj-
syndicate.en.1000.arpa
lmplz -o 4 < ../factored-corpus/proj-syndicate.1000.de > ../factored-
corpus/kenlm/proj-
syndicate.de.1000.arpa
train-model.perl --root-dir pos-kenlm-small \
--corpus factored-corpus/proj-syndicate.1000 \
--f de --e en \
--lm 2:5:/home/moses/mt/moses3/factored-
corpus/kenlm/proj-syndicate.en.1000.arpa:8 \
--translation-factors 0-0,2 \
-mgiza \
--external-bin-dir ./training-tools
I am able to run the decoder:
echo "putin beschreibt menschen ." | moses -f pos-kenlm-
small/model/moses.ini
BEST TRANSLATION: putin|nnp describes|vbz people|nns
Now, I wanted to see for myself that the factored model is able to
handle the situation
where the input sentence is reordered if we downweight the reordering
model, just like in the
abovementioned tutorial:
echo "menschen beschreibt putin ." | moses -f pos-kenlm-
small/model/moses.ini -dl -1
BEST TRANSLATION: people|nns describes|vbz putin|nnp
In the tutorial, a better translation is returned ("putin describes
people"). Note that
instead of the "-d 0.2" option mentioned in the tutorial I used "-dl -1"
to downweight the
reordering model as "-d" is no longer supported. I am not sure if that's
correct.
Thank you for any advice.
Best regards,
Stanislav Kurik
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support