Dear All Moses provides a recaser and a truecaser. I am unsure about which I should use or whether I should use them at all. Please can anyone advise?
This is how I understand them (please correct me): With the recaser you build a Moses decoder as normal, with lowercased data. You also train a separate recaser with cased data of the target language. You can then run the recaser on the lowercased output from the Moses decoder. With the truecaser you build a Moses decoder with cased data (keeping words in their natural case). You build a truecaser with cased data of the source language. Input to the decoder must be piped through the truecaser; output from the decoder is piped through a detruecaser. What is the difference between recasing and truecasing (other than the above)? It seems possible to me that using the truecaser might affect translation quality. Does it improve or worsen translation quality significantly? Why is it preferable to use the truecaser, rather than building a decoder using cased data (but where sentence initial words are not necessary capitalised)? Best wishes Ivan -- ******************************** Ivan Uemlianin Canolfan Bedwyr Safle'r Normal Site Prifysgol Bangor University BANGOR Gwynedd LL57 2PZ [email protected] http://www.bangor.ac.uk/~cbs007/ ******************************** _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
