I always thought that lowercasing was about the sparse data problem 
and not about poor input data. But actually I'm not sure if the GIZA 
alignments on lowercased europarl data are any better than on the 
original forms. did anyone carry out a thorough comparison for various 
language pairs yet?

jörg


On Wed, 05 Mar 2008 16:42:44 +0100
  Hubert Crépy <[EMAIL PROTECTED]> wrote:
> Chris Dyer a écrit :
>> One argument against preserving case information is that some of 
>>what
>> you may want to translate in a large-coverage system may be
>> incorrectly cased to begin with (e.g., informal text, such as what 
>>is
>> found in emails, newsgroups, etc).
>>   
> Good point, one that I hadn't considered: "poor quality" input (in 
>other 
> words: real world input).
> I just wonder how much harm we do to the translation of "good 
>quality" 
> input, in the hopes of fixing problems with "poor quality" input...
> Some would call me rigid, but I personally would try to favor users 
>who 
> provide good input, and not worry too much about those who don't.
> 
>Faced with improper input, would it not make more sense to try and 
>"fix 
> it" in the source language before translation, rather than 
>distorting 
> the translation with the induced errors, then trying to fix the 
> translation ?
> 
> -- 
> Hubert Crépy
> 
> 
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to