Hieu Hoang Researcher New York University, Abu Dhabi http://www.hoang.co.uk/hieu
On 19 May 2015 at 14:39, Carla Parra <[email protected]> wrote: > Dear Hieu, > > thanks for looking into this! As far as I can see, your to-translate.txt > seems similar to mine (i.e. our tags look the same). > > The moses.ini files however are a bit different. Ours was generated by > EMS. While we differ in the feature functions section, the xml-input and > the placeholder-factor are identical. I have an additional weight section > and in my mapping-steps I have "0 T 0", while you only have "T 0". Could > any of this be the cause? > > What I have observed is that the tags were correctly used where they > should be used, thus retrieving the right translations and markup was > removed. However, in some sentences there appears suddenly a tag, as I > illustrated yesterday in my example: > > "Allow simple password", is translated as "Permitir simple contraseña <ne > translation="@tag@" entity="</1>">@tag@</ne> ." > > The fact that in such cases the tags have not been removed by the script > doing so makes me think that they are somehow learnt in the training > process as individual tokens. I have checked the phrase table, and I found > things like: > > "with ||| con <ne translation="@tag@" entity="<2>">@tag@</ne> ||| > 0.106785 0.447898 0.000321641 4.17018e-05 ||| 0-0 ||| 5 1660 1 ||| |||" > Ah, I see. In the training data, you should only add the @tag@, not the xml part. You should look at how the example script works scripts/generic/ph_numbers.perl > Sometimes the tag is not complete (as if it had been tokenized, which in > principle was prevented by using the protected tokenization and a list of > patterns): > > "with the ||| con <ne translation="@tag@" ||| 0.0166851 0.0176098 > 0.000606043 0.00561383 ||| 0-0 ||| 32 881 1 ||| |||" > > I am not an expert, so my guess might be totally wrong, but this makes me > think that somehow MOSES also used the text within the tags in training. In > a previous email I explained that I had encountered problems when using > Chris Dyer's FastAlign because it converted all special characters to their > corresponding codes, so I commented out that loop. Now I wonder whether > this might be the cause of MOSES using the tags in training? How should I > call the word aligner so that it ignores the tags? > > > Best, > Carla > > El 19.05.2015 11:58, Hieu Hoang escribió: > >> it looks ok to me, not sure what could be wrong. >> >> i've added a daily test to ensure that the placeholder will work in >> future. Perhaps you can have a look at the moses.ini file and >> to-translate.txt files to see if there are any differences with yours >> >> >> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder >> [4] >> >> Hieu Hoang >> Researcher >> >> New York University, Abu Dhabi >> >> http://www.hoang.co.uk/hieu [1] >> >> On 19 May 2015 at 11:53, Carla Parra <[email protected]> >> wrote: >> >> Dear Hieu, >>> >>> thanks for your reply. I attach the config file, my moses.ini (I >>> think this is the one you want to get), and a few lines of our input >>> file, already preprocessed. If you want the RAW lines I can also >>> send them to you. >>> >>> I don't know if this will be a similar issue, but I tried the same >>> strategy using the forced translations (<np >>> translation="German">Deutsch</np>), and this morning I have observed >>> the same, some tags are suddenly appearing in the translation. >>> >>> Thank you very much for your support! >>> >>> Carla >>> >>> El 19.05.2015 09:13, Hieu Hoang escribió: >>> what is the exact command you used to decode? Can you please >>> provide >>> the moses.ini file and a few lines of your input data for us to >>> look >>> at. >>> >>> Hieu Hoang >>> Researcher >>> >>> New York University, Abu Dhabi >>> >>> http://www.hoang.co.uk/hieu [1] [3] >>> >>> >>> On 18 May 2015 at 15:35, Carla Parra <[email protected]> >>> wrote: >>> >>> Dear all, >>> >>> we just finished some experiments using placeables, and we have >>> observed >>> several issues that may be worth sharing. I don't know if someone >>> has >>> experienced the same, or you were already aware of this, but just >>> in >>> case: >>> >>> (1) Special characters must be scaped in the "entity" value field. >>> Otherwise, the cause XML parsing errors at tuning (not at training, >>> though!), and wrong values are retrieved from the tags (e.g. we had >>> text >>> with additional quotation marks, and this caused that the >>> translation >>> stopped at the first quotation mark, not yielding the complete >>> "entity" >>> value we had encoded). >>> >>> (2) <ne> tags are added to sentences as if they were computed as >>> tokens >>> during training. (i.e. not ignored, as they just contain the >>> placeables). >>> As an example, the English sentence "Allow simple password", is >>> translated as "Permitir simple contraseña <ne translation="@tag@" >>> entity="</1>">@tag@</ne> ." >>> >>> While the first issue is our fault, we do not know what causes the >>> second one. We have followed the instructions at the MOSES advanced >>> features site and thus specified "extract-settings = "--Placeholder >>> @tag@"" in training and "-placeholder-factor 1 -xml-input >>> exclusive" in >>> the decoder and evaluation. Has anyone experienced the same thing >>> and/or >>> know how to solve this issue? >>> >>> Thank you very much. Best regards, >>> >>> Carla >>> >>> -- >>> Carla Parra Escartín >>> Marie Curie Experienced Researcher - EXPERT ITN >>> http://expert-itn.eu/ [2] [1] >>> Hermes Traducciones >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected] >>> http://mailman.mit.edu/mailman/listinfo/moses-support [3] [2] >>> >>> Links: >>> ------ >>> [1] http://expert-itn.eu/ [2] >>> [2] http://mailman.mit.edu/mailman/listinfo/moses-support [3] >>> [3] http://www.hoang.co.uk/hieu [1] >>> >> >> -- >> Carla Parra Escartín >> Marie Curie Experienced Researcher - EXPERT ITN >> http://expert-itn.eu/ [2] >> Hermes Traducciones >> >> >> Links: >> ------ >> [1] http://www.hoang.co.uk/hieu >> [2] http://expert-itn.eu/ >> [3] http://mailman.mit.edu/mailman/listinfo/moses-support >> [4] >> >> https://github.com/moses-smt/moses-regression-tests/tree/master/tests/phrase.placeholder >> > > -- > Carla Parra Escartín > Marie Curie Experienced Researcher - EXPERT ITN > http://expert-itn.eu/ > Hermes Traducciones >
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
