Hi Floran,
if you have one file with words, and one file with POS, you can combine
the two with the combine_factors.pl script in mosesdecoder/scripts/training.
best wishes,
Rico
On 18.07.2016 10:44, Gmehlin Floran wrote:
Hi,
I would like to try a Factored Training on my corpus. I see that with
TreeTagger (from uni-muenchen.de) we can parse a text file so that it
outputs the POS. However, I haven't been able to produce the desired
format for Moses (with POS and Lemmas). There are a bunch of scripts
in the scripts/training/wrappers/ folder including one for TreeTagger,
but all it does is to produce a separate file with POS only.
I have seen that this question has already been posted 2y ago on this
mailing list, but remained unanswered.
Is there a script or a possibility to parse a text file to get the as
output a file in the Moses format for Factored Training ?
E.g. :
word0factor0|word0factor1|word0factor2 word1factor0|word1factor1|word1factor2
Thank you for your help !
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support