Hi, Sounds reasonable to me, but it would be good to have this as an option, as Miles suggested.
-phi On 25 Oct 2010 17:40, "Ben Gottesman" <[email protected]> wrote: > Hi, > > Are truecase models still widely in use? > > I have a proposal for a tweak to the train-truecaser.perl script. > > Currently, we don't take the first token of a sentence as evidence for the > true casing of that type, on the basis that the first word of a sentence is > always capitalized. The first token of a segment is always assumed to be > the first word of a sentence, and thus is never taken as casing evidence. > > However, if a given segment is only one token long, then the segment is > probably not a sentence, and the token is quite possibly in its natural > case. So my proposal is to take the sole token of one-token segments as > evidence for true casing. > > I attach the code change. > > Any objections? If not, I'll check it in. > > Ben
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
