Hi Floran,

if you have one file with words, and one file with POS, you can combine the two with the combine_factors.pl script in mosesdecoder/scripts/training.

best wishes,
Rico

On 18.07.2016 10:44, Gmehlin Floran wrote:
Hi,

I would like to try a Factored Training on my corpus. I see that with TreeTagger (from uni-muenchen.de) we can parse a text file so that it outputs the POS. However, I haven't been able to produce the desired format for Moses (with POS and Lemmas). There are a bunch of scripts in the scripts/training/wrappers/ folder including one for TreeTagger, but all it does is to produce a separate file with POS only.

I have seen that this question has already been posted 2y ago on this mailing list, but remained unanswered.

Is there a script or a possibility to parse a text file to get the as output a file in the Moses format for Factored Training ?

E.g. :
word0factor0|word0factor1|word0factor2 word1factor0|word1factor1|word1factor2

Thank you for your help !


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to