Thanks Hieu and Achim for the new feature. I think it's great. Some
questions:
1) When envoking mert-moses.pl to tune a model prepared with
placeholders, and the dev set includes placeholders, it looks like the
new moses command line options (-placeholder-factor 1 -xml-input
exclusive) should be placed in the "--decoder-flags" or in the config
file. Can you confirm?
2) Are there any limits as to what escape sequences are used as
placeholders? Your example was @num@. Could this just as easily be
%(num)s if carried through all the necessary steps?
3) If we change your example to
"you owe me $ 42.85 ."
and update the ph_numbers.perl to re-format numbers with the target
language formatting
"you owe me $ <ne translation="@num@" entity="42,85">@num@</ne> .
would the corresponding translated output include the 42,85?
4) If the entity="" value must include reserved/special characters, such
as &, <, >, or Moses restricted vertical bar | , should they be escaped
within the quotes like the tokenizer.perl and escape-special-chars.perl
scripts escape them?
5) The last I recall, the --xlm-input option wasn't particular about
what XML tag is used. Is this still true, the example could be <anytag/>
and still work the same?
6) Any chance to backport this feature to RELEASE-1.0? How much work do
you think would be involved? If we choose to do the backport, can you
point us in the right direction and do you want the updates for a
RELEASE-1.1?
Thanks,
Tom
On 10/10/2013 08:30 PM, Hieu Hoang wrote:
On 10 October 2013 13:33, Nicola Bertoldi <[email protected]
<mailto:[email protected]>> wrote:
Hi Hieu
I read the documentation
and you mention that you enable the exclusive mode of xml-input
I see few issues:
- you mention that you enable the exclusive mode of xml-input;
this can conflict with other usage of xml-input which instead
require the inclusive mode.
do you have any comments on that?
it can be exclusive, inclusive or anything else except pass-through.
It just requires the XML handling to run
- when you use the exclusive mode you force the translation of the
span (@num@) with "100")
and other larger span including @num@ are not allowed
am I right?
If yes, what is the advantage of having phrase pairs including
other words
it doesn't create XML options, it just needs the XML parsing to run.
- what is the meaning of "-placeholder-factor 1" ?
It stores the original text in the source factor 1. The placeholder
symbol is in the factor 0, or whatever the translation model was
configured to use.
Nicola Bertoldi
On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:
Hi all
Achim and I have been working on adding support for placeholders
into Moses. That is, replacing a number, date, or named entity
with a symbol eg. @num@, -date-, =named-entity=. We think it would
be especially useful for commercial users of Moses, and for people
translating text with lots of numbers, dates etc.
It is now supported in the Moses training and decoding pipeline.
See the following URL for more details.
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
<mailto:[email protected]><mailto:[email protected]
<mailto:[email protected]>>
http://mailman.mit.edu/mailman/listinfo/moses-support
--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support