this sounds risky to me.  it would be better to allow the user to
specify the behaviour;  for your suggestions, you would add an extra
flag which would enable this.  the default would be for truecasing to
operate as it used to.

Miles

On 25 October 2010 17:37, Ben Gottesman <[email protected]> wrote:
> Hi,
>
> Are truecase models still widely in use?
>
> I have a proposal for a tweak to the train-truecaser.perl script.
>
> Currently, we don't take the first token of a sentence as evidence for the
> true casing of that type, on the basis that the first word of a sentence is
> always capitalized.  The first token of a segment is always assumed to be
> the first word of a sentence, and thus is never taken as casing evidence.
>
> However, if a given segment is only one token long, then the segment is
> probably not a sentence, and the token is quite possibly in its natural
> case.  So my proposal is to take the sole token of one-token segments as
> evidence for true casing.
>
> I attach the code change.
>
> Any objections?  If not, I'll check it in.
>
> Ben
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to