I see! Should have thought of that... I will modify my script and change it accordingly. Hopefully next round it will work better. Thanks!
Carla El 19.05.2015 12:45, Hieu Hoang escribió: > Hieu Hoang > Researcher > > New York University, Abu Dhabi > > http://www.hoang.co.uk/hieu [2] > > On 19 May 2015 at 14:39, Carla Parra <[email protected]> > wrote: > >> Dear Hieu, >> >> thanks for looking into this! As far as I can see, your >> to-translate.txt seems similar to mine (i.e. our tags look the >> same). >> >> The moses.ini files however are a bit different. Ours was generated >> by EMS. While we differ in the feature functions section, the >> xml-input and the placeholder-factor are identical. I have an >> additional weight section and in my mapping-steps I have "0 T 0", >> while you only have "T 0". Could any of this be the cause? >> >> What I have observed is that the tags were correctly used where >> they should be used, thus retrieving the right translations and >> markup was removed. However, in some sentences there appears >> suddenly a tag, as I illustrated yesterday in my example: >> >> "Allow simple password", is translated as "Permitir simple >> contraseña <ne translation="@tag@" entity="</1>">@tag@</ne> >> ." >> >> The fact that in such cases the tags have not been removed by the >> script doing so makes me think that they are somehow learnt in the >> training process as individual tokens. I have checked the phrase >> table, and I found things like: >> >> "with ||| con <ne translation="@tag@" entity="<2>">@tag@</ne> >> ||| 0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1 >> ||| |||" > > Ah, I see. In the training data, you should only add the @tag@, not > the xml part. > > You should look at how the example script works > scripts/generic/ph_numbers.perl > >> Sometimes the tag is not complete (as if it had been tokenized, >> which in principle was prevented by using the protected tokenization >> and a list of patterns): >> >> "with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098 >> 0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||" >> >> I am not an expert, so my guess might be totally wrong, but this >> makes me think that somehow MOSES also used the text within the tags >> in training. In a previous email I explained that I had encountered >> problems when using Chris Dyer's FastAlign because it converted all >> special characters to their corresponding codes, so I commented out >> that loop. Now I wonder whether this might be the cause of MOSES >> using the tags in training? How should I call the word aligner so >> that it ignores the tags? >> >> Best, >> Carla >> >> El 19.05.2015 11:58, Hieu Hoang escribió: >> it looks ok to me, not sure what could be wrong. >> >> i've added a daily test to ensure that the placeholder will work in >> future. Perhaps you can have a look at the moses.ini file and >> to-translate.txt files to see if there are any differences with >> yours >> >> > https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder >> [1] >> [4] >> >> Hieu Hoang >> Researcher >> >> New York University, Abu Dhabi >> >> http://www.hoang.co.uk/hieu [2] [1] >> >> On 19 May 2015 at 11:53, Carla Parra <[email protected]> >> wrote: >> >> Dear Hieu, >> >> thanks for your reply. I attach the config file, my moses.ini (I >> think this is the one you want to get), and a few lines of our >> input >> file, already preprocessed. If you want the RAW lines I can also >> send them to you. >> >> I don't know if this will be a similar issue, but I tried the same >> strategy using the forced translations (<np >> translation="German">Deutsch</np>), and this morning I have >> observed >> the same, some tags are suddenly appearing in the translation. >> >> Thank you very much for your support! >> >> Carla >> >> El 19.05.2015 09:13, Hieu Hoang escribió: >> what is the exact command you used to decode? Can you please >> provide >> the moses.ini file and a few lines of your input data for us to >> look >> at. >> >> Hieu Hoang >> Researcher >> >> New York University, Abu Dhabi >> >> http://www.hoang.co.uk/hieu [2] [1] [3] >> >> On 18 May 2015 at 15:35, Carla Parra <[email protected]> >> wrote: >> >> Dear all, >> >> we just finished some experiments using placeables, and we have >> observed >> several issues that may be worth sharing. I don't know if someone >> has >> experienced the same, or you were already aware of this, but just >> in >> case: >> >> (1) Special characters must be scaped in the "entity" value field. >> Otherwise, the cause XML parsing errors at tuning (not at training, >> though!), and wrong values are retrieved from the tags (e.g. we had >> text >> with additional quotation marks, and this caused that the >> translation >> stopped at the first quotation mark, not yielding the complete >> "entity" >> value we had encoded). >> >> (2) <ne> tags are added to sentences as if they were computed as >> tokens >> during training. (i.e. not ignored, as they just contain the >> placeables). >> As an example, the English sentence "Allow simple password", is >> translated as "Permitir simple contraseña <ne translation="@tag@" >> entity="</1>">@tag@</ne> ." >> >> While the first issue is our fault, we do not know what causes the >> second one. We have followed the instructions at the MOSES advanced >> features site and thus specified "extract-settings = "--Placeholder >> @tag@"" in training and "-placeholder-factor 1 -xml-input >> exclusive" in >> the decoder and evaluation. Has anyone experienced the same thing >> and/or >> know how to solve this issue? >> >> Thank you very much. Best regards, >> >> Carla >> >> -- >> Carla Parra Escartín >> Marie Curie Experienced Researcher - EXPERT ITN >> http://expert-itn.eu/ [3] [2] [1] >> Hermes Traducciones >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support [4] [3] [2] >> >> Links: >> ------ >> [1] http://expert-itn.eu/ [3] [2] >> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [4] [3] >> [3] http://www.hoang.co.uk/hieu [2] [1] >> >> -- >> Carla Parra Escartín >> Marie Curie Experienced Researcher - EXPERT ITN >> http://expert-itn.eu/ [3] [2] >> Hermes Traducciones >> >> Links: >> ------ >> [1] http://www.hoang.co.uk/hieu [2] >> [2] http://expert-itn.eu/ [3] >> [3] http://mailman.mit.edu/mailman/listinfo/moses-support [4] >> [4] >> > https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder >> [1] > > -- > Carla Parra Escartín > Marie Curie Experienced Researcher - EXPERT ITN > http://expert-itn.eu/ [3] > Hermes Traducciones > > > > Links: > ------ > [1] > https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder > [2] http://www.hoang.co.uk/hieu > [3] http://expert-itn.eu/ > [4] http://mailman.mit.edu/mailman/listinfo/moses-support -- Carla Parra Escartín Marie Curie Experienced Researcher - EXPERT ITN http://expert-itn.eu/ Hermes Traducciones _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
