Hi tom Sent while bumping into things
> On 13 Oct 2013, at 17:01, Tom Hoar <[email protected]> > wrote: > > Thanks Hieu and Achim for the new feature. I think it's great. Some questions: > > 1) When envoking mert-moses.pl to tune a model prepared with placeholders, > and the dev set includes placeholders, it looks like the new moses command > line options (-placeholder-factor 1 -xml-input exclusive) should be placed in > the "--decoder-flags" or in the config file. Can you confirm? Yep, they are decoder flags. > > 2) Are there any limits as to what escape sequences are used as placeholders? > Your example was @num@. Could this just as easily be %(num)s if carried > through all the necessary steps? No limit on what the placeholder 'word' should be There can also be multiple, different placeholder words. @num@ for numbers, %(date) for dates, :place: for place names etc > > 3) If we change your example to > > "you owe me $ 42.85 ." > > and update the ph_numbers.perl to re-format numbers with the target language > formatting > > "you owe me $ <ne translation="@num@" entity="42,85">@num@</ne> . > > would the corresponding translated output include the 42,85? Yes, 42,85 will be the output. The placeholder script should be language pair specific. There are flags to specify source an target language in the script but i don't think they used at the moment. You shoul extend it > > 4) If the entity="" value must include reserved/special characters, such as > &, <, >, or Moses restricted vertical bar | , should they be escaped within > the quotes like the tokenizer.perl and escape-special-chars.perl scripts > escape them? Dunno. Haven't kicked the tyres on this yet. You should ver on the safe side and escape it. Also, since you have to I escape the whole output sentence, not escaping it may cause you problems > > 5) The last I recall, the --xlm-input option wasn't particular about what XML > tag is used. Is this still true, the example could be <anytag/> and still > work the same? No, it must be <ne ..> In fact, we're thinking of changing <anytag/> to something fixed, like <option/> The <anytag/> behaviour isn't good XML and will cause problems in the future Any opinions on this gratefully received > > 6) Any chance to backport this feature to RELEASE-1.0? How much work do you > think would be involved? If we choose to do the backport, can you point us in > the right direction and do you want the updates for a RELEASE-1.1? Can't add this to release 1. It depends on stuff that's only in the current github code The current code will read most ini files you create with release 1, so that should lessen your pain However, it would be good if you can move to release 2.0, it would cause less headaches for you and me. The ini file shouldn't change from what we have now in github > > Thanks, > Tom > > > > >> On 10/10/2013 08:30 PM, Hieu Hoang wrote: >> >> >> >>> On 10 October 2013 13:33, Nicola Bertoldi <[email protected]> wrote: >>> Hi Hieu >>> >>> I read the documentation >>> and you mention that you enable the exclusive mode of xml-input >>> >>> I see few issues: >>> >>> - you mention that you enable the exclusive mode of xml-input; >>> this can conflict with other usage of xml-input which instead require the >>> inclusive mode. >>> do you have any comments on that? >> >> it can be exclusive, inclusive or anything else except pass-through. It just >> requires the XML handling to run >> >>> >>> - when you use the exclusive mode you force the translation of the span >>> (@num@) with "100") >>> and other larger span including @num@ are not allowed >>> am I right? >>> If yes, what is the advantage of having phrase pairs including other words >> >> it doesn't create XML options, it just needs the XML parsing to run. >> >>> >>> - what is the meaning of "-placeholder-factor 1" ? >> It stores the original text in the source factor 1. The placeholder symbol >> is in the factor 0, or whatever the translation model was configured to use. >> >>> >>> >>> Nicola Bertoldi >>> >>> >>> >>> >>> On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: >>> >>> Hi all >>> >>> Achim and I have been working on adding support for placeholders into >>> Moses. That is, replacing a number, date, or named entity with a symbol eg. >>> @num@, -date-, =named-entity=. We think it would be especially useful for >>> commercial users of Moses, and for people translating text with lots of >>> numbers, dates etc. >>> >>> It is now supported in the Moses training and decoding pipeline. See the >>> following URL for more details. >>> h >>> >>> -- >>> Hieu Hoang >>> Research Associate >>> University of Edinburgh >>> http://www.hoang.co.uk/hieu >>> >>> _______________________________________________ >>> Moses-support mailing list >>> [email protected]<mailto:[email protected]> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> >> >> -- >> Hieu Hoang >> Research Associate >> University of Edinburgh >> http://www.hoang.co.uk/hieu >> >> >> >> _______________________________________________ >> Moses-support mailing list >> [email protected] >> http://mailman.mit.edu/mailman/listinfo/moses-support > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
