Dear All

Moses provides a recaser and a truecaser.  I am unsure about which I 
should use or whether I should use them at all.  Please can anyone advise?

This is how I understand them (please correct me):

With the recaser you build a Moses decoder as normal, with lowercased 
data.  You also train a separate recaser with cased data of the target 
language.  You can then run the recaser on the lowercased output from 
the Moses decoder.

With the truecaser you build a Moses decoder with cased data (keeping 
words in their natural case).  You build a truecaser with cased data of 
the source language.  Input to the decoder must be piped through the 
truecaser; output from the decoder is piped through a detruecaser.

What is the difference between recasing and truecasing (other than the 
above)?

It seems possible to me that using the truecaser might affect 
translation quality.  Does it improve or worsen translation quality 
significantly?

Why is it preferable to use the truecaser, rather than building a 
decoder using cased data (but where sentence initial words are not 
necessary capitalised)?

Best wishes

Ivan

-- 
********************************
Ivan Uemlianin

Canolfan Bedwyr
Safle'r Normal Site
Prifysgol Bangor University
BANGOR
Gwynedd
LL57 2PZ

[email protected]
http://www.bangor.ac.uk/~cbs007/
********************************

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to