Hi all,

 

I would like some help from anyone with experience in Brill's pos tagger. I
have some encoding problems in the output file after I run the rule learner
(unkown-lexical-learn.prl script). All the generated files up to this point
are in UTF-8 encoding and appear fine, but the output file created by the
rule learner cannot display Greek characters correctly.

 

Hoping that this issue will be solved and I will finally manage to create a
Greek pos tagger, I was wondering if tagging the English source text too and
adding more generation/translation steps will produce better results. Also,
the Greek pos tagset I use is the one provided by Xerox. In that case, I
suppose I will have to use the same tagset for tagging the English text too,
right? I am not very happy with Xerox's tagset because it is fairly limited
and I think Brill's English tagger uses the Penn's tagset, so I am a little
confused. Should I just use the Penn tag set for tagging the Greek text too?
(I have only tagged 2000 words so far, and I could modify accordingly the
tags without too much effort.)

 

Thanks

 

Panos

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to