Hi all,
I would like some help from anyone with experience in Brill's pos tagger. I have some encoding problems in the output file after I run the rule learner (unkown-lexical-learn.prl script). All the generated files up to this point are in UTF-8 encoding and appear fine, but the output file created by the rule learner cannot display Greek characters correctly. Hoping that this issue will be solved and I will finally manage to create a Greek pos tagger, I was wondering if tagging the English source text too and adding more generation/translation steps will produce better results. Also, the Greek pos tagset I use is the one provided by Xerox. In that case, I suppose I will have to use the same tagset for tagging the English text too, right? I am not very happy with Xerox's tagset because it is fairly limited and I think Brill's English tagger uses the Penn's tagset, so I am a little confused. Should I just use the Penn tag set for tagging the Greek text too? (I have only tagged 2000 words so far, and I could modify accordingly the tags without too much effort.) Thanks Panos
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
