Dear Hieu,

thanks for looking into this! As far as I can see, your to-translate.txt 
seems similar to mine (i.e. our tags look the same).

The moses.ini files however are a bit different. Ours was generated by 
EMS. While we differ in the feature functions section, the xml-input and 
the placeholder-factor are identical. I have an additional weight 
section and in my mapping-steps I have "0 T 0", while you only have "T 
0". Could any of this be the cause?

What I have observed is that the tags were correctly used where they 
should be used, thus retrieving the right translations and markup was 
removed. However, in some sentences there appears suddenly a tag, as I 
illustrated yesterday in my example:

"Allow simple password", is translated as "Permitir simple contraseña 
<ne translation="@tag@" entity="&lt;/1&gt;">@tag@</ne> ."

The fact that in such cases the tags have not been removed by the script 
doing so makes me think that they are somehow learnt in the training 
process as individual tokens. I have checked the phrase table, and I 
found things like:

"with ||| con <ne translation="@tag@" entity="&lt;2&gt;">@tag@</ne> ||| 
0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1 ||| |||"

Sometimes the tag is not complete (as if it had been tokenized, which in 
principle was prevented by using the protected tokenization and a list 
of patterns):

"with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098 
0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||"

I am not an expert, so my guess might be totally wrong, but this makes 
me think that somehow MOSES also used the text within the tags in 
training. In a previous email I explained that I had encountered 
problems when using Chris Dyer's FastAlign because it converted all 
special characters to their corresponding codes, so I commented out that 
loop. Now I wonder whether this might be the cause of MOSES using the 
tags in training? How should I call the word aligner so that it ignores 
the tags?


Best,
Carla

El 19.05.2015 11:58, Hieu Hoang escribió:
> it looks ok to me, not sure what could be wrong.
> 
> i've added a daily test to ensure that the placeholder will work in
> future. Perhaps you can have a look at the moses.ini file and
> to-translate.txt files to see if there are any differences with yours
>  
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
> [4]
> 
> Hieu Hoang
> Researcher
> 
> New York University, Abu Dhabi
> 
> http://www.hoang.co.uk/hieu [1]
> 
> On 19 May 2015 at 11:53, Carla Parra <[email protected]>
> wrote:
> 
>> Dear Hieu,
>> 
>> thanks for your reply. I attach the config file, my moses.ini (I
>> think this is the one you want to get), and a few lines of our input
>> file, already preprocessed. If you want the RAW lines I can also
>> send them to you.
>> 
>> I don't know if this will be a similar issue, but I tried the same
>> strategy using the forced translations (<np
>> translation="German">Deutsch</np>), and this morning I have observed
>> the same, some tags are suddenly appearing in the translation.
>> 
>> Thank you very much for your support!
>> 
>> Carla
>> 
>> El 19.05.2015 09:13, Hieu Hoang escribió:
>> what is the exact command you used to decode? Can you please
>> provide
>> the moses.ini file and a few lines of your input data for us to
>> look
>> at.
>> 
>> Hieu Hoang
>> Researcher
>> 
>> New York University, Abu Dhabi
>> 
>> http://www.hoang.co.uk/hieu [1] [3]
>> 
>> On 18 May 2015 at 15:35, Carla Parra <[email protected]>
>> wrote:
>> 
>> Dear all,
>> 
>> we just finished some experiments using placeables, and we have
>> observed
>> several issues that may be worth sharing. I don't know if someone
>> has
>> experienced the same, or you were already aware of this, but just
>> in
>> case:
>> 
>> (1) Special characters must be scaped in the "entity" value field.
>> Otherwise, the cause XML parsing errors at tuning (not at training,
>> though!), and wrong values are retrieved from the tags (e.g. we had
>> text
>> with additional quotation marks, and this caused that the
>> translation
>> stopped at the first quotation mark, not yielding the complete
>> "entity"
>> value we had encoded).
>> 
>> (2) <ne> tags are added to sentences as if they were computed as
>> tokens
>> during training. (i.e. not ignored, as they just contain the
>> placeables).
>> As an example, the English sentence "Allow simple password", is
>> translated as "Permitir simple contraseña <ne translation="@tag@"
>> entity="&lt;/1&gt;">@tag@</ne> ."
>> 
>> While the first issue is our fault, we do not know what causes the
>> second one. We have followed the instructions at the MOSES advanced
>> features site and thus specified "extract-settings = "--Placeholder
>> @tag@"" in training and "-placeholder-factor 1 -xml-input
>> exclusive" in
>> the decoder and evaluation. Has anyone experienced the same thing
>> and/or
>> know how to solve this issue?
>> 
>> Thank you very much. Best regards,
>> 
>> Carla
>> 
>> --
>> Carla Parra Escartín
>> Marie Curie Experienced Researcher - EXPERT ITN
>> http://expert-itn.eu/ [2] [1]
>> Hermes Traducciones
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support [3] [2]
>> 
>> Links:
>> ------
>> [1] http://expert-itn.eu/ [2]
>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [3]
>> [3] http://www.hoang.co.uk/hieu [1]
> 
>  --
>  Carla Parra Escartín
>  Marie Curie Experienced Researcher - EXPERT ITN
>  http://expert-itn.eu/ [2]
>  Hermes Traducciones
> 
> 
> Links:
> ------
> [1] http://www.hoang.co.uk/hieu
> [2] http://expert-itn.eu/
> [3] http://mailman.mit.edu/mailman/listinfo/moses-support
> [4]
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder

-- 
Carla Parra Escartín
Marie Curie Experienced Researcher - EXPERT ITN
http://expert-itn.eu/
Hermes Traducciones
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to