Hieu Hoang
Researcher
New York University, Abu Dhabi
http://www.hoang.co.uk/hieu

On 19 May 2015 at 14:39, Carla Parra <[email protected]> wrote:

> Dear Hieu,
>
> thanks for looking into this! As far as I can see, your to-translate.txt
> seems similar to mine (i.e. our tags look the same).
>
> The moses.ini files however are a bit different. Ours was generated by
> EMS. While we differ in the feature functions section, the xml-input and
> the placeholder-factor are identical. I have an additional weight section
> and in my mapping-steps I have "0 T 0", while you only have "T 0". Could
> any of this be the cause?
>
> What I have observed is that the tags were correctly used where they
> should be used, thus retrieving the right translations and markup was
> removed. However, in some sentences there appears suddenly a tag, as I
> illustrated yesterday in my example:
>
> "Allow simple password", is translated as "Permitir simple contraseña <ne
> translation="@tag@" entity="&lt;/1&gt;">@tag@</ne> ."
>
> The fact that in such cases the tags have not been removed by the script
> doing so makes me think that they are somehow learnt in the training
> process as individual tokens. I have checked the phrase table, and I found
> things like:
>
> "with ||| con <ne translation="@tag@" entity="&lt;2&gt;">@tag@</ne> |||
> 0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1 ||| |||"
>
Ah, I see. In the training data, you should only add the @tag@, not the xml
part.

You should look at how the example script works
   scripts/generic/ph_numbers.perl


> Sometimes the tag is not complete (as if it had been tokenized, which in
> principle was prevented by using the protected tokenization and a list of
> patterns):
>
> "with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098
> 0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||"
>
> I am not an expert, so my guess might be totally wrong, but this makes me
> think that somehow MOSES also used the text within the tags in training. In
> a previous email I explained that I had encountered problems when using
> Chris Dyer's FastAlign because it converted all special characters to their
> corresponding codes, so I commented out that loop. Now I wonder whether
> this might be the cause of MOSES using the tags in training? How should I
> call the word aligner so that it ignores the tags?
>
>
> Best,
> Carla
>
> El 19.05.2015 11:58, Hieu Hoang escribió:
>
>> it looks ok to me, not sure what could be wrong.
>>
>> i've added a daily test to ensure that the placeholder will work in
>> future. Perhaps you can have a look at the moses.ini file and
>> to-translate.txt files to see if there are any differences with yours
>>
>>
>> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>> [4]
>>
>> Hieu Hoang
>> Researcher
>>
>> New York University, Abu Dhabi
>>
>> http://www.hoang.co.uk/hieu [1]
>>
>> On 19 May 2015 at 11:53, Carla Parra <[email protected]>
>> wrote:
>>
>>  Dear Hieu,
>>>
>>> thanks for your reply. I attach the config file, my moses.ini (I
>>> think this is the one you want to get), and a few lines of our input
>>> file, already preprocessed. If you want the RAW lines I can also
>>> send them to you.
>>>
>>> I don't know if this will be a similar issue, but I tried the same
>>> strategy using the forced translations (<np
>>> translation="German">Deutsch</np>), and this morning I have observed
>>> the same, some tags are suddenly appearing in the translation.
>>>
>>> Thank you very much for your support!
>>>
>>> Carla
>>>
>>> El 19.05.2015 09:13, Hieu Hoang escribió:
>>> what is the exact command you used to decode? Can you please
>>> provide
>>> the moses.ini file and a few lines of your input data for us to
>>> look
>>> at.
>>>
>>> Hieu Hoang
>>> Researcher
>>>
>>> New York University, Abu Dhabi
>>>
>>> http://www.hoang.co.uk/hieu [1] [3]
>>>
>>>
>>> On 18 May 2015 at 15:35, Carla Parra <[email protected]>
>>> wrote:
>>>
>>> Dear all,
>>>
>>> we just finished some experiments using placeables, and we have
>>> observed
>>> several issues that may be worth sharing. I don't know if someone
>>> has
>>> experienced the same, or you were already aware of this, but just
>>> in
>>> case:
>>>
>>> (1) Special characters must be scaped in the "entity" value field.
>>> Otherwise, the cause XML parsing errors at tuning (not at training,
>>> though!), and wrong values are retrieved from the tags (e.g. we had
>>> text
>>> with additional quotation marks, and this caused that the
>>> translation
>>> stopped at the first quotation mark, not yielding the complete
>>> "entity"
>>> value we had encoded).
>>>
>>> (2) <ne> tags are added to sentences as if they were computed as
>>> tokens
>>> during training. (i.e. not ignored, as they just contain the
>>> placeables).
>>> As an example, the English sentence "Allow simple password", is
>>> translated as "Permitir simple contraseña <ne translation="@tag@"
>>> entity="&lt;/1&gt;">@tag@</ne> ."
>>>
>>> While the first issue is our fault, we do not know what causes the
>>> second one. We have followed the instructions at the MOSES advanced
>>> features site and thus specified "extract-settings = "--Placeholder
>>> @tag@"" in training and "-placeholder-factor 1 -xml-input
>>> exclusive" in
>>> the decoder and evaluation. Has anyone experienced the same thing
>>> and/or
>>> know how to solve this issue?
>>>
>>> Thank you very much. Best regards,
>>>
>>> Carla
>>>
>>> --
>>> Carla Parra Escartín
>>> Marie Curie Experienced Researcher - EXPERT ITN
>>> http://expert-itn.eu/ [2] [1]
>>> Hermes Traducciones
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support [3] [2]
>>>
>>> Links:
>>> ------
>>> [1] http://expert-itn.eu/ [2]
>>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [3]
>>> [3] http://www.hoang.co.uk/hieu [1]
>>>
>>
>>  --
>>  Carla Parra Escartín
>>  Marie Curie Experienced Researcher - EXPERT ITN
>>  http://expert-itn.eu/ [2]
>>  Hermes Traducciones
>>
>>
>> Links:
>> ------
>> [1] http://www.hoang.co.uk/hieu
>> [2] http://expert-itn.eu/
>> [3] http://mailman.mit.edu/mailman/listinfo/moses-support
>> [4]
>>
>> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>>
>
> --
> Carla Parra Escartín
> Marie Curie Experienced Researcher - EXPERT ITN
> http://expert-itn.eu/
> Hermes Traducciones
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to