Hi, Are truecase models still widely in use?
I have a proposal for a tweak to the train-truecaser.perl script. Currently, we don't take the first token of a sentence as evidence for the true casing of that type, on the basis that the first word of a sentence is always capitalized. The first token of a segment is always assumed to be the first word of a sentence, and thus is never taken as casing evidence. However, if a given segment is only one token long, then the segment is probably not a sentence, and the token is quite possibly in its natural case. So my proposal is to take the sole token of one-token segments as evidence for true casing. I attach the code change. Any objections? If not, I'll check it in. Ben
train-truecaser.perl
Description: Binary data
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
