I see! Should have thought of that... I will modify my script and change 
it accordingly. Hopefully next round it will work better. Thanks!

Carla

El 19.05.2015 12:45, Hieu Hoang escribió:
> Hieu Hoang
> Researcher
> 
> New York University, Abu Dhabi
> 
> http://www.hoang.co.uk/hieu [2]
> 
> On 19 May 2015 at 14:39, Carla Parra <[email protected]>
> wrote:
> 
>> Dear Hieu,
>> 
>> thanks for looking into this! As far as I can see, your
>> to-translate.txt seems similar to mine (i.e. our tags look the
>> same).
>> 
>> The moses.ini files however are a bit different. Ours was generated
>> by EMS. While we differ in the feature functions section, the
>> xml-input and the placeholder-factor are identical. I have an
>> additional weight section and in my mapping-steps I have "0 T 0",
>> while you only have "T 0". Could any of this be the cause?
>> 
>> What I have observed is that the tags were correctly used where
>> they should be used, thus retrieving the right translations and
>> markup was removed. However, in some sentences there appears
>> suddenly a tag, as I illustrated yesterday in my example:
>> 
>> "Allow simple password", is translated as "Permitir simple
>> contraseña <ne translation="@tag@" entity="&lt;/1&gt;">@tag@</ne>
>> ."
>> 
>> The fact that in such cases the tags have not been removed by the
>> script doing so makes me think that they are somehow learnt in the
>> training process as individual tokens. I have checked the phrase
>> table, and I found things like:
>> 
>> "with ||| con <ne translation="@tag@" entity="&lt;2&gt;">@tag@</ne>
>> ||| 0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1
>> ||| |||"
> 
> Ah, I see. In the training data, you should only add the @tag@, not
> the xml part.
> 
> You should look at how the example script works 
>    scripts/generic/ph_numbers.perl
> 
>> Sometimes the tag is not complete (as if it had been tokenized,
>> which in principle was prevented by using the protected tokenization
>> and a list of patterns):
>> 
>> "with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098
>> 0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||"
>> 
>> I am not an expert, so my guess might be totally wrong, but this
>> makes me think that somehow MOSES also used the text within the tags
>> in training. In a previous email I explained that I had encountered
>> problems when using Chris Dyer's FastAlign because it converted all
>> special characters to their corresponding codes, so I commented out
>> that loop. Now I wonder whether this might be the cause of MOSES
>> using the tags in training? How should I call the word aligner so
>> that it ignores the tags?
>> 
>> Best,
>> Carla
>> 
>> El 19.05.2015 11:58, Hieu Hoang escribió:
>> it looks ok to me, not sure what could be wrong.
>> 
>> i've added a daily test to ensure that the placeholder will work in
>> future. Perhaps you can have a look at the moses.ini file and
>> to-translate.txt files to see if there are any differences with
>> yours
>>  
>> 
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>> [1]
>> [4]
>> 
>> Hieu Hoang
>> Researcher
>> 
>> New York University, Abu Dhabi
>> 
>> http://www.hoang.co.uk/hieu [2] [1]
>> 
>> On 19 May 2015 at 11:53, Carla Parra <[email protected]>
>> wrote:
>> 
>> Dear Hieu,
>> 
>> thanks for your reply. I attach the config file, my moses.ini (I
>> think this is the one you want to get), and a few lines of our
>> input
>> file, already preprocessed. If you want the RAW lines I can also
>> send them to you.
>> 
>> I don't know if this will be a similar issue, but I tried the same
>> strategy using the forced translations (<np
>> translation="German">Deutsch</np>), and this morning I have
>> observed
>> the same, some tags are suddenly appearing in the translation.
>> 
>> Thank you very much for your support!
>> 
>> Carla
>> 
>> El 19.05.2015 09:13, Hieu Hoang escribió:
>> what is the exact command you used to decode? Can you please
>> provide
>> the moses.ini file and a few lines of your input data for us to
>> look
>> at.
>> 
>> Hieu Hoang
>> Researcher
>> 
>> New York University, Abu Dhabi
>> 
>> http://www.hoang.co.uk/hieu [2] [1] [3]
>> 
>> On 18 May 2015 at 15:35, Carla Parra <[email protected]>
>> wrote:
>> 
>> Dear all,
>> 
>> we just finished some experiments using placeables, and we have
>> observed
>> several issues that may be worth sharing. I don't know if someone
>> has
>> experienced the same, or you were already aware of this, but just
>> in
>> case:
>> 
>> (1) Special characters must be scaped in the "entity" value field.
>> Otherwise, the cause XML parsing errors at tuning (not at training,
>> though!), and wrong values are retrieved from the tags (e.g. we had
>> text
>> with additional quotation marks, and this caused that the
>> translation
>> stopped at the first quotation mark, not yielding the complete
>> "entity"
>> value we had encoded).
>> 
>> (2) <ne> tags are added to sentences as if they were computed as
>> tokens
>> during training. (i.e. not ignored, as they just contain the
>> placeables).
>> As an example, the English sentence "Allow simple password", is
>> translated as "Permitir simple contraseña <ne translation="@tag@"
>> entity="&lt;/1&gt;">@tag@</ne> ."
>> 
>> While the first issue is our fault, we do not know what causes the
>> second one. We have followed the instructions at the MOSES advanced
>> features site and thus specified "extract-settings = "--Placeholder
>> @tag@"" in training and "-placeholder-factor 1 -xml-input
>> exclusive" in
>> the decoder and evaluation. Has anyone experienced the same thing
>> and/or
>> know how to solve this issue?
>> 
>> Thank you very much. Best regards,
>> 
>> Carla
>> 
>> --
>> Carla Parra Escartín
>> Marie Curie Experienced Researcher - EXPERT ITN
>> http://expert-itn.eu/ [3] [2] [1]
>> Hermes Traducciones
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support [4] [3] [2]
>> 
>> Links:
>> ------
>> [1] http://expert-itn.eu/ [3] [2]
>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [4] [3]
>> [3] http://www.hoang.co.uk/hieu [2] [1]
>> 
>>  --
>>  Carla Parra Escartín
>>  Marie Curie Experienced Researcher - EXPERT ITN
>>  http://expert-itn.eu/ [3] [2]
>>  Hermes Traducciones
>> 
>> Links:
>> ------
>> [1] http://www.hoang.co.uk/hieu [2]
>> [2] http://expert-itn.eu/ [3]
>> [3] http://mailman.mit.edu/mailman/listinfo/moses-support [4]
>> [4]
>> 
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
>> [1]
> 
>  --
>  Carla Parra Escartín
>  Marie Curie Experienced Researcher - EXPERT ITN
>  http://expert-itn.eu/ [3]
>  Hermes Traducciones
> 
> 
> 
> Links:
> ------
> [1]
> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder
> [2] http://www.hoang.co.uk/hieu
> [3] http://expert-itn.eu/
> [4] http://mailman.mit.edu/mailman/listinfo/moses-support

-- 
Carla Parra Escartín
Marie Curie Experienced Researcher - EXPERT ITN
http://expert-itn.eu/
Hermes Traducciones
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to