Thanks Hieu and Achim for the new feature. I think it's great. Some questions:

1) When envoking mert-moses.pl to tune a model prepared with placeholders, and the dev set includes placeholders, it looks like the new moses command line options (-placeholder-factor 1 -xml-input exclusive) should be placed in the "--decoder-flags" or in the config file. Can you confirm?

2) Are there any limits as to what escape sequences are used as placeholders? Your example was @num@. Could this just as easily be %(num)s if carried through all the necessary steps?

3) If we change your example to

   "you owe me $ 42.85 ."

and update the ph_numbers.perl to re-format numbers with the target language formatting

   "you owe me $ <ne translation="@num@" entity="42,85">@num@</ne> .

would the corresponding translated output include the 42,85?

4) If the entity="" value must include reserved/special characters, such as &, <, >, or Moses restricted vertical bar | , should they be escaped within the quotes like the tokenizer.perl and escape-special-chars.perl scripts escape them?

5) The last I recall, the --xlm-input option wasn't particular about what XML tag is used. Is this still true, the example could be <anytag/> and still work the same?

6) Any chance to backport this feature to RELEASE-1.0? How much work do you think would be involved? If we choose to do the backport, can you point us in the right direction and do you want the updates for a RELEASE-1.1?

Thanks,
Tom




On 10/10/2013 08:30 PM, Hieu Hoang wrote:



On 10 October 2013 13:33, Nicola Bertoldi <[email protected] <mailto:[email protected]>> wrote:

    Hi Hieu

    I read the documentation
    and you mention that you enable the exclusive mode of xml-input

    I see few issues:

    - you mention that you enable the exclusive mode of xml-input;
      this can conflict with other usage of xml-input which instead
    require the  inclusive mode.
      do you have any comments on that?


it can be exclusive, inclusive or anything else except pass-through. It just requires the XML handling to run


    - when you use the exclusive mode you force the translation of the
    span (@num@) with "100")
      and other larger span including @num@ are not allowed
      am I right?
      If yes, what is the advantage of having phrase pairs including
    other words


it doesn't create XML options, it just needs the XML parsing to run.


    - what is the meaning of      "-placeholder-factor 1" ?

It stores the original text in the source factor 1. The placeholder symbol is in the factor 0, or whatever the translation model was configured to use.



    Nicola Bertoldi




    On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:

    Hi all

    Achim and I have been working on adding support for placeholders
    into Moses. That is, replacing a number, date, or named entity
    with a symbol eg. @num@, -date-, =named-entity=. We think it would
    be especially useful for commercial users of Moses, and for people
    translating text with lots of numbers, dates etc.

    It is now supported in the Moses training and decoding pipeline.
    See the following URL  for more details.
    http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60

    --
    Hieu Hoang
    Research Associate
    University of Edinburgh
    http://www.hoang.co.uk/hieu

    _______________________________________________
    Moses-support mailing list
    [email protected]
    <mailto:[email protected]><mailto:[email protected]
    <mailto:[email protected]>>
    http://mailman.mit.edu/mailman/listinfo/moses-support





--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to