Hi Ondrej, thank you for the literature pointers. I am trying to build an as good as possible baseline system for English-to-Polish SMT using factors. Until now it is rather frustrating, but from what I have seen in the literature that's not really surprising. Later I hope to come up with something clever to improve upon that.
Among the work you have dedicated to the problem of Czech morphology in English-to-Czech MT, which system would you recommend for such a baseline? Best, Marcin On 10.06.2012 22:16, Ondrej Bojar wrote: > Hi, Marcin, > > yes, the root of the trouble is that all possibilities are multiplied. > Cube pruning can be considered "just a clever speedup" (a very clever, > of course), but I think implementing a similar thing would not be very > useful here. This of course depends on what you actually use the factors > for, but if you use them for morphology or anything that has to get > support for a particular choice in the context, you can't avoid the > problem. > > Consider a noun: using a few factored steps, you can easily produce > translation options for all potential cases. But without the context of > the preceding verb, preposition or e.g. adjective, you can't pick the > correct one. So pruning of the translation options for the noun is > likely to prevent you from getting the agreement right. I've run into > this issue a few times already (most recently this year, > http://aclweb.org/anthology-new/W/W12/W12-3130.pdf) and I've tried > circumventing it using a two-step approach, which postpones the > morphological explosion to a separate search (where lemmas are already > chosen). Needless to say, Alex Fraser (in the follow-up work of > http://www.statmt.org/wmt09/pdf/WMT-0920.pdf) was somewhat more successful. > > So you don't want to just limit the number of options, what you actually > want is to select the good ones... > > O. > > On 06/10/2012 08:21 PM, Marcin Junczys-Dowmunt wrote: >> Hi Ondrej, >> The blow-up is happening in "DecodeStepGeneration::Process(...)", right? >> If understand the code correctly from a first glance, all possibilities >> are simply multiplied. And indeed, there seems to be no way to limit the >> number of combinations in this step. Could something like Cube-Pruning >> work here to limit the number of options right from the beginning? >> Best, >> Marcin >> >> On 10.06.2012 19:02, Ondrej Bojar wrote: >>> Dear Marcin, >>> >>> the short answer is: you need to avoid the blow-up. >>> >>> The options that affect pruning during creation of translation >>> options are: >>> >>> -ttable-limit ...how many variants of a phrase to read from the phrase >>> table) >>> >>> -max-partial-trans-opt ...how many partial translation options are >>> considered for a span. This is the critical pruning to contain the >>> blowup in memory. >>> >>> -max-trans-opt-per-coverage ...how many finished options should be then >>> passed to the search. >>> -translation-option-threshold ...the same thing, but expressed relative >>> to the score of the best one. >>> >>> If you set the model so that it does blow up but you don't thrash your >>> machine by setting -max-partial-trans-opt reasonably low, you are very >>> likely to get a lot of search errors because the pruning of translation >>> options happens too early, without the linear context of surrounding >>> translation options. Moses simply does not have good means to handle the >>> combinatorics of factored models. >>> >>> Cheers, Ondrej. >>> >>> On 06/10/2012 06:40 PM, Marcin Junczys-Dowmunt wrote: >>>> Hi, >>>> by the way, are there some best-practice decoder settings for heavily >>>> factored models with combinatorial blow-up? If I am not wrong, most >>>> settings affect hypothesis recombination later on. Here the heavy work >>>> happens during the creation of target phrases and future score >>>> calculation before the actual translation. >>>> Best, >>>> Marcin >>>> >>>> W dniu 09.06.2012 16:45, Philipp Koehn pisze: >>>>> Hi, >>>>> >>>>> the idea here was to create a link between the >>>>> words and POS tags early on and use this as >>>>> an additional scoring function. But if you see better >>>>> performance with your setting, please report back. >>>>> >>>>> -phi >>>>> >>>>> On Fri, Jun 8, 2012 at 6:03 PM, Marcin Junczys-Dowmunt >>>>> <[email protected]> wrote: >>>>>> Hi all, >>>>>> I have a question concerning the "Tutorial for Using Factored >>>>>> Models", >>>>>> section on "Train a morphological analysis and generation model". >>>>>> >>>>>> The following translation factors and generation factors are trained >>>>>> for >>>>>> the given example corpus: >>>>>> >>>>>> --translation-factors 1-1+3-2 \ >>>>>> --generation-factors 1-2+1,2-0 \ >>>>>> --decoding-steps t0,g0,t1,g1 >>>>>> >>>>>> What is the advantage of using the first generation factor 1-2 >>>>>> compared >>>>>> to the configuration below? >>>>>> >>>>>> --translation-factors 1-1+3-2 \ >>>>>> --generation-factors 1,2-0 \ >>>>>> --decoding-steps t0,t1,g1 >>>>>> >>>>>> I understand the 1-2 generation factor maps lemmas to POS+morph >>>>>> information, but the same information is also generated by the 3-2 >>>>>> translation factor. Apart from that this generation factor introduces >>>>>> huge combinatorial blow-up, since every lemma can be mapped to >>>>>> basically >>>>>> every possible morphological information seen for this lemma. >>>>>> _______________________________________________ >>>>>> Moses-support mailing list >>>>>> [email protected] >>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>> >>>> >>> >> >> > -- dr inż. Marcin Junczys-Dowmunt Uniwersytet im. Adama Mickiewicza Wydział Matematyki i Informatyki ul. Umultowska 87 61-614 Poznań _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
