Dear Hieu, thanks for looking into this! As far as I can see, your to-translate.txt seems similar to mine (i.e. our tags look the same).
The moses.ini files however are a bit different. Ours was generated by EMS. While we differ in the feature functions section, the xml-input and the placeholder-factor are identical. I have an additional weight section and in my mapping-steps I have "0 T 0", while you only have "T 0". Could any of this be the cause? What I have observed is that the tags were correctly used where they should be used, thus retrieving the right translations and markup was removed. However, in some sentences there appears suddenly a tag, as I illustrated yesterday in my example: "Allow simple password", is translated as "Permitir simple contraseña <ne translation="@tag@" entity="</1>">@tag@</ne> ." The fact that in such cases the tags have not been removed by the script doing so makes me think that they are somehow learnt in the training process as individual tokens. I have checked the phrase table, and I found things like: "with ||| con <ne translation="@tag@" entity="<2>">@tag@</ne> ||| 0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1 ||| |||" Sometimes the tag is not complete (as if it had been tokenized, which in principle was prevented by using the protected tokenization and a list of patterns): "with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098 0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||" I am not an expert, so my guess might be totally wrong, but this makes me think that somehow MOSES also used the text within the tags in training. In a previous email I explained that I had encountered problems when using Chris Dyer's FastAlign because it converted all special characters to their corresponding codes, so I commented out that loop. Now I wonder whether this might be the cause of MOSES using the tags in training? How should I call the word aligner so that it ignores the tags? Best, Carla El 19.05.2015 11:58, Hieu Hoang escribió: > it looks ok to me, not sure what could be wrong. > > i've added a daily test to ensure that the placeholder will work in > future. Perhaps you can have a look at the moses.ini file and > to-translate.txt files to see if there are any differences with yours > > https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder > [4] > > Hieu Hoang > Researcher > > New York University, Abu Dhabi > > http://www.hoang.co.uk/hieu [1] > > On 19 May 2015 at 11:53, Carla Parra <[email protected]> > wrote: > >> Dear Hieu, >> >> thanks for your reply. I attach the config file, my moses.ini (I >> think this is the one you want to get), and a few lines of our input >> file, already preprocessed. If you want the RAW lines I can also >> send them to you. >> >> I don't know if this will be a similar issue, but I tried the same >> strategy using the forced translations (<np >> translation="German">Deutsch</np>), and this morning I have observed >> the same, some tags are suddenly appearing in the translation. >> >> Thank you very much for your support! >> >> Carla >> >> El 19.05.2015 09:13, Hieu Hoang escribió: >> what is the exact command you used to decode? Can you please >> provide >> the moses.ini file and a few lines of your input data for us to >> look >> at. >> >> Hieu Hoang >> Researcher >> >> New York University, Abu Dhabi >> >> http://www.hoang.co.uk/hieu [1] [3] >> >> On 18 May 2015 at 15:35, Carla Parra <[email protected]> >> wrote: >> >> Dear all, >> >> we just finished some experiments using placeables, and we have >> observed >> several issues that may be worth sharing. I don't know if someone >> has >> experienced the same, or you were already aware of this, but just >> in >> case: >> >> (1) Special characters must be scaped in the "entity" value field. >> Otherwise, the cause XML parsing errors at tuning (not at training, >> though!), and wrong values are retrieved from the tags (e.g. we had >> text >> with additional quotation marks, and this caused that the >> translation >> stopped at the first quotation mark, not yielding the complete >> "entity" >> value we had encoded). >> >> (2) <ne> tags are added to sentences as if they were computed as >> tokens >> during training. (i.e. not ignored, as they just contain the >> placeables). >> As an example, the English sentence "Allow simple password", is >> translated as "Permitir simple contraseña <ne translation="@tag@" >> entity="</1>">@tag@</ne> ." >> >> While the first issue is our fault, we do not know what causes the >> second one. We have followed the instructions at the MOSES advanced >> features site and thus specified "extract-settings = "--Placeholder >> @tag@"" in training and "-placeholder-factor 1 -xml-input >> exclusive" in >> the decoder and evaluation. Has anyone experienced the same thing >> and/or >> know how to solve this issue? >> >> Thank you very much. Best regards, >> >> Carla >> >> -- >> Carla Parra Escartín >> Marie Curie Experienced Researcher - EXPERT ITN >> http://expert-itn.eu/ [2] [1] >> Hermes Traducciones >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support [3] [2] >> >> Links: >> ------ >> [1] http://expert-itn.eu/ [2] >> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [3] >> [3] http://www.hoang.co.uk/hieu [1] > > -- > Carla Parra Escartín > Marie Curie Experienced Researcher - EXPERT ITN > http://expert-itn.eu/ [2] > Hermes Traducciones > > > Links: > ------ > [1] http://www.hoang.co.uk/hieu > [2] http://expert-itn.eu/ > [3] http://mailman.mit.edu/mailman/listinfo/moses-support > [4] > https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder -- Carla Parra Escartín Marie Curie Experienced Researcher - EXPERT ITN http://expert-itn.eu/ Hermes Traducciones _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
