this sounds risky to me. it would be better to allow the user to specify the behaviour; for your suggestions, you would add an extra flag which would enable this. the default would be for truecasing to operate as it used to.
Miles On 25 October 2010 17:37, Ben Gottesman <[email protected]> wrote: > Hi, > > Are truecase models still widely in use? > > I have a proposal for a tweak to the train-truecaser.perl script. > > Currently, we don't take the first token of a sentence as evidence for the > true casing of that type, on the basis that the first word of a sentence is > always capitalized. The first token of a segment is always assumed to be > the first word of a sentence, and thus is never taken as casing evidence. > > However, if a given segment is only one token long, then the segment is > probably not a sentence, and the token is quite possibly in its natural > case. So my proposal is to take the sole token of one-token segments as > evidence for true casing. > > I attach the code change. > > Any objections? If not, I'll check it in. > > Ben > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
