recaser: builds a Moses model for word translation from lowercased to cased and also uses a language model. Input to recaser is lowercased.
truecaser: builds a casing model based on the number of times each version appears in text (e.g. rivet (4/8) Rivet (3) RIVET (1)). Input to truecaser is as it is and not lowercased. Therefore, if text is noisy such as Tweets, recaser may perform better. Best Regards, Ergun Ergun Biçici, CNGL, School of Computing, DCU, www.cngl.ie http://www.computing.dcu.ie/~ebicici/ On Wed, May 20, 2015 at 8:07 PM, Philipp Koehn <[email protected]> wrote: > Hi, > > yes, this is what the RECASER section in EMS enables. > > -phi > > On Wed, May 20, 2015 at 2:50 PM, Lane Schwartz <[email protected]> wrote: > >> Got it. So then, how was casing handled in the "mbr/mp" column? Was all >> of the data lowercased, then models trained, then recasing applied after >> decoding? Or something else? >> >> On Wed, May 20, 2015 at 1:30 PM, Philipp Koehn <[email protected]> wrote: >> >>> Hi, >>> >>> no, the changes are made incrementally. >>> >>> So the recesed "baseline" is the previous "mbr/mp" column. >>> >>> -phi >>> >>> On Wed, May 20, 2015 at 2:01 PM, Lane Schwartz <[email protected]> >>> wrote: >>> >>>> Philipp, >>>> >>>> In Table 2 of the WMT 2009 paper, are the "baseline" and "truecased" >>>> columns directly comparable? In other words, do the two columns indicate >>>> identical conditions other than a single variable (how and/or when casing >>>> was handled)? >>>> >>>> In the baseline condition, how and when was casing handled? >>>> >>>> Thanks, >>>> Lane >>>> >>>> >>>> On Wed, May 20, 2015 at 12:43 PM, Philipp Koehn <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> see Section 2.2 in our WMT 2009 submission: >>>>> http://www.statmt.org/wmt09/pdf/WMT-0929.pdf >>>>> >>>>> One practical reason to avoid recasing is the need >>>>> for a second large cased language model. >>>>> >>>>> But there is of course also the practical issue with >>>>> have a unique truecasing scheme for each data >>>>> condition, handling of headlines, all-caps emphasis, >>>>> etc. >>>>> >>>>> It would be worth to revisit this issue again under >>>>> different data conditions / language pairs. Both >>>>> options are readily available in EMS. >>>>> >>>>> Each of the two alternative methods could be >>>>> improved as well. See for instance: >>>>> http://www.aclweb.org/anthology/N06-1001 >>>>> >>>>> -phi >>>>> >>>>> -phi >>>>> >>>>> >>>>> On Wed, May 20, 2015 at 12:31 PM, Lane Schwartz <[email protected]> >>>>> wrote: >>>>> >>>>>> Philipp (and others), >>>>>> >>>>>> I'm wondering what people's experience is regarding when truecasing >>>>>> is applied. >>>>>> >>>>>> One option is to truecase the training data, then train your TM and >>>>>> LM using that truecased data. Another option would be to lowercase the >>>>>> data, train TM and LM on the lowercased data, and then perform truecasing >>>>>> after decoding. >>>>>> >>>>>> I assume that the former gives better results, but the latter >>>>>> approach has an advantage in terms of extensibility (namely if you get >>>>>> more >>>>>> data and update your truecase model, you don't have to re-train all of >>>>>> your >>>>>> TMs and LMs). >>>>>> >>>>>> Does anyone have any insights they would care to share on this? >>>>>> >>>>>> Thanks, >>>>>> Lane >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>>> >>>>>> >>>>> >>>> >>>> >>>> -- >>>> When a place gets crowded enough to require ID's, social collapse is not >>>> far away. It is time to go elsewhere. The best thing about space >>>> travel >>>> is that it made it possible to go elsewhere. >>>> -- R.A. Heinlein, "Time Enough For Love" >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> [email protected] >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> >> >> >> -- >> When a place gets crowded enough to require ID's, social collapse is not >> far away. It is time to go elsewhere. The best thing about space travel >> is that it made it possible to go elsewhere. >> -- R.A. Heinlein, "Time Enough For Love" >> > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
