Re: [Moses-support] Placeholder drift

Tom Hoar Tue, 31 Jul 2012 09:11:40 -0700

 One correction on my statements. The {}{} token could also result in 
 the output if it's an unknown token passed through from the source 
 during translation.


 On Tue, 31 Jul 2012 17:37:28 +0200, Daniel Schaut 
 <[email protected]> wrote:
> Tom, that's a good point. Henry, you can also check your phrase table 
> with
> queryPhraseTable to track back the entry that may cause the issue.
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] 
> [mailto:[email protected]] Im
> Auftrag von Tom Hoar
> Gesendet: 31 July 2012 16:58
> An: [email protected]
> Betreff: Re: [Moses-support] Placeholder drift
>
>  John, this is true if there were three tokens, but {}Processor{} has 
> no
> spaces. Assuming that the target language should be {}processeur{}  
> without
> spaces in both the parallel and LM data, the tables and the  language 
> model
> will treat it as one token and not break break it up.
>
>  Henry, I suspect your corpus preparation inserts spaces between to  
> create
> {} Processor {} (3 tokens). John's description is much more  viable 
> if this
> is the case.
>
>  One oddity is the output {}{} token because it's one token, not two.
>  Moses won't remove the space to splice the two. It would seem your  
> target
> data contains this as a token from somewhere in the tables or LM.
>
>  I suggest you double-check your tokenization and other preparation 
> to
> ensure source and target are still one token when you start training.
>
>  Tom
>
>
>  On Tue, 31 Jul 2012 10:08:43 -0400, John D Burger <[email protected]>
>  wrote:
>> Are there any such placeholders in your language modeling data and
>> your parallel training data?  If not, all the models are going to
>> treat them as unknown words.  In the case of the language model, it
>> doesn't surprise me too much that the placeholders all get pushed
>> together, as that will produce fewer discontiguous subsequences, 
>> which
>> the language model will prefer.
>>
>> - John Burger
>>   MITRE
>>
>> On Jul 31, 2012, at 03:05 , Henry Hu wrote:
>>
>>> Hi,
>>>
>>> I use a model to translate English to French. First, I replaced 
>>> HTML
>>> tags such as <a>, <b>, with the placeholder {}, like this:
>>>
>>> {}Processor{}
>>>
>>> Then decoding. To my confusion, I got the result:
>>>
>>> {}{} processeur
>>>
>>> instead of {}processeur{}. Why did the placeholder move? How can I
>>> make it fixed? Thanks for any suggestion.
>>>
>>> Henry
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Placeholder drift

Reply via email to