Hi, Marcin,

yes, the root of the trouble is that all possibilities are multiplied. 
Cube pruning can be considered "just a clever speedup" (a very clever, 
of course), but I think implementing a similar thing would not be very 
useful here. This of course depends on what you actually use the factors 
for, but if you use them for morphology or anything that has to get 
support for a particular choice in the context, you can't avoid the problem.

Consider a noun: using a few factored steps, you can easily produce 
translation options for all potential cases. But without the context of 
the preceding verb, preposition or e.g. adjective, you can't pick the 
correct one. So pruning of the translation options for the noun is 
likely to prevent you from getting the agreement right. I've run into 
this issue a few times already (most recently this year, 
http://aclweb.org/anthology-new/W/W12/W12-3130.pdf) and I've tried 
circumventing it using a two-step approach, which postpones the 
morphological explosion to a separate search (where lemmas are already 
chosen). Needless to say, Alex Fraser (in the follow-up work of 
http://www.statmt.org/wmt09/pdf/WMT-0920.pdf) was somewhat more successful.

So you don't want to just limit the number of options, what you actually 
want is to select the good ones...

O.

On 06/10/2012 08:21 PM, Marcin Junczys-Dowmunt wrote:
> Hi Ondrej,
> The blow-up is happening in "DecodeStepGeneration::Process(...)", right?
> If understand the code correctly from a first glance, all possibilities
> are simply multiplied. And indeed, there seems to be no way to limit the
> number of combinations in this step. Could something like Cube-Pruning
> work here to limit the number of options right from the beginning?
> Best,
> Marcin
>
> On 10.06.2012 19:02, Ondrej Bojar wrote:
>> Dear Marcin,
>>
>> the short answer is: you need to avoid the blow-up.
>>
>> The options that affect pruning during creation of translation options are:
>>
>> -ttable-limit ...how many variants of a phrase to read from the phrase
>> table)
>>
>> -max-partial-trans-opt ...how many partial translation options are
>> considered for a span. This is the critical pruning to contain the
>> blowup in memory.
>>
>> -max-trans-opt-per-coverage ...how many finished options should be then
>> passed to the search.
>> -translation-option-threshold ...the same thing, but expressed relative
>> to the score of the best one.
>>
>> If you set the model so that it does blow up but you don't thrash your
>> machine by setting -max-partial-trans-opt reasonably low, you are very
>> likely to get a lot of search errors because the pruning of translation
>> options happens too early, without the linear context of surrounding
>> translation options. Moses simply does not have good means to handle the
>> combinatorics of factored models.
>>
>> Cheers, Ondrej.
>>
>> On 06/10/2012 06:40 PM, Marcin Junczys-Dowmunt wrote:
>>> Hi,
>>> by the way, are there some best-practice decoder settings for heavily
>>> factored models with combinatorial blow-up? If I am not wrong, most
>>> settings affect hypothesis recombination later on. Here the heavy work
>>> happens during the creation of target phrases and future score
>>> calculation before the actual translation.
>>> Best,
>>> Marcin
>>>
>>> W dniu 09.06.2012 16:45, Philipp Koehn pisze:
>>>> Hi,
>>>>
>>>> the idea here was to create a link between the
>>>> words and POS tags early on and use this as
>>>> an additional scoring function. But if you see better
>>>> performance with your setting, please report back.
>>>>
>>>> -phi
>>>>
>>>> On Fri, Jun 8, 2012 at 6:03 PM, Marcin Junczys-Dowmunt
>>>> <[email protected]>  wrote:
>>>>> Hi all,
>>>>> I have a question concerning the "Tutorial for Using Factored Models",
>>>>> section on "Train a morphological analysis and generation model".
>>>>>
>>>>> The following translation factors and generation factors are trained
>>>>> for
>>>>> the given example corpus:
>>>>>
>>>>> --translation-factors 1-1+3-2 \
>>>>> --generation-factors 1-2+1,2-0 \
>>>>> --decoding-steps t0,g0,t1,g1
>>>>>
>>>>> What is the advantage of using the first generation factor 1-2 compared
>>>>> to the configuration below?
>>>>>
>>>>> --translation-factors 1-1+3-2 \
>>>>> --generation-factors 1,2-0 \
>>>>> --decoding-steps t0,t1,g1
>>>>>
>>>>> I understand the 1-2 generation factor maps lemmas to POS+morph
>>>>> information, but the same information is also generated by the 3-2
>>>>> translation factor. Apart from that this generation factor introduces
>>>>> huge combinatorial blow-up, since every lemma can be mapped to
>>>>> basically
>>>>> every possible morphological information seen for this lemma.
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>
>>>
>>
>
>

-- 
Ondrej Bojar (mailto:[email protected] / [email protected])
http://www.cuni.cz/~obo
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to