Dear all, we just finished some experiments using placeables, and we have observed several issues that may be worth sharing. I don't know if someone has experienced the same, or you were already aware of this, but just in case:
(1) Special characters must be scaped in the "entity" value field. Otherwise, the cause XML parsing errors at tuning (not at training, though!), and wrong values are retrieved from the tags (e.g. we had text with additional quotation marks, and this caused that the translation stopped at the first quotation mark, not yielding the complete "entity" value we had encoded). (2) <ne> tags are added to sentences as if they were computed as tokens during training. (i.e. not ignored, as they just contain the placeables). As an example, the English sentence "Allow simple password", is translated as "Permitir simple contraseña <ne translation="@tag@" entity="</1>">@tag@</ne> ." While the first issue is our fault, we do not know what causes the second one. We have followed the instructions at the MOSES advanced features site and thus specified "extract-settings = "--Placeholder @tag@"" in training and "-placeholder-factor 1 -xml-input exclusive" in the decoder and evaluation. Has anyone experienced the same thing and/or know how to solve this issue? Thank you very much. Best regards, Carla -- Carla Parra Escartín Marie Curie Experienced Researcher - EXPERT ITN http://expert-itn.eu/ Hermes Traducciones _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
