Re: [Moses-support] placeholders for numbers - extract step
hi vito On 18 November 2014 11:30, Vito Mandorino vito.mandor...@linguacustodia.com wrote: Hello everyone, I am trying to use placeholders for numbers in phrase-based MT, according to http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc75 The above page says --- During extraction, add the following to the extract command (phrase-based only for now): ./extract --Placeholders @num@ -- Does this mean that I have to first run train-model.perl with --last-step=4, then the line above and then again train-model.perl with --first-step=6? when you run train-model.perl, add the argument -extract-options '--Placeholders @num@' You can see it in this script that the EMS creates http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3 If this is the case, which arguments and options should I pass to extract for a baseline training? I think the syntax is something like The script will then call extract with the following argument --Placeholders @num@ You can see it in the STDERR file of the above script http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3.STDERR syntax: extract en de align extract max-length [orientation [ --model [wbe|phrase|hier]-[msd|mslr|mono] ] | --OnlyOutputSpanInfo | --NoTTable | --GZOutput | --IncludeSentenceId | --SentenceOffset n | --InstanceWeights filename ] In particular I cannot figure out what should be passed as 'align' and 'extract' arguments. Regards, Vito -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 %2B33%206%2084%2065%2068%2089* *Email :* *vito.mandor...@linguacustodia.com massinissa.ah...@linguacustodia.com* *Website :* *www.linguacustodia.com http://www.linguacustodia.com/ - www.thetranslationtrustee.com http://www.thetranslationtrustee.com/* ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] placeholders for numbers - extract step
Thank you Hieu, that worked very well. I am now tackling the decoding part and I have two questions. 1) Sometimes, I get the following error message during decoding: terminate called after throwing an instance of 'util::Exception' what(): moses-cmd/IOWrapper.cpp:213 in std::maplong unsigned int, const Moses::Factor* MosesCmd::GetPlaceholders(const Moses::Hypothesis, Moses::FactorType) threw util::Exception because `targetPos.size() != 1'. Placeholder should be aligned to 1, and only 1, word Aborted I don't understand why. I checked the phrase-table and I didn't find phrase pairs where the '@num@' token is aligned to 2 or more words. 2) This may be related to the first question. If I run the decoder to translate the input using the suggested command ./moses -placeholder-factor 1 -xml-input exclusive I get the '@num@' string in the output and not the expected number. I do get the number if I use the option '-placeholder-factor 0'. The model that I am using is a phrase-based, non-factored model. Vito 2014-11-19 10:32 GMT+01:00 Hieu Hoang hieu.ho...@ed.ac.uk: hi vito On 18 November 2014 11:30, Vito Mandorino vito.mandor...@linguacustodia.com wrote: Hello everyone, I am trying to use placeholders for numbers in phrase-based MT, according to http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc75 The above page says --- During extraction, add the following to the extract command (phrase-based only for now): ./extract --Placeholders @num@ -- Does this mean that I have to first run train-model.perl with --last-step=4, then the line above and then again train-model.perl with --first-step=6? when you run train-model.perl, add the argument -extract-options '--Placeholders @num@' You can see it in this script that the EMS creates http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3 If this is the case, which arguments and options should I pass to extract for a baseline training? I think the syntax is something like The script will then call extract with the following argument --Placeholders @num@ You can see it in the STDERR file of the above script http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3.STDERR syntax: extract en de align extract max-length [orientation [ --model [wbe|phrase|hier]-[msd|mslr|mono] ] | --OnlyOutputSpanInfo | --NoTTable | --GZOutput | --IncludeSentenceId | --SentenceOffset n | --InstanceWeights filename ] In particular I cannot figure out what should be passed as 'align' and 'extract' arguments. Regards, Vito -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89 %2B33%206%2084%2065%2068%2089* *Email :* *vito.mandor...@linguacustodia.com massinissa.ah...@linguacustodia.com* *Website :* *www.linguacustodia.com http://www.linguacustodia.com/ - www.thetranslationtrustee.com http://www.thetranslationtrustee.com/* ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu -- *M**. Vito MANDORINO -- Chief Scientist* [image: Description : Description : lingua_custodia_final full logo] *The Translation Trustee* *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux* *Tel : +33 1 30 44 04 23 Mobile : +33 6 84 65 68 89* *Email :* *vito.mandor...@linguacustodia.com massinissa.ah...@linguacustodia.com* *Website :* *www.linguacustodia.com http://www.linguacustodia.com/ - www.thetranslationtrustee.com http://www.thetranslationtrustee.com/* ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
anytag/ is XML-compliant in schema-less XML (as long as the tag name complies to http://www.w3.org/TR/REC-xml/#NT-Name) IMHO Moses input (with the -xml-input option) should stay schema-less, or we should define a schema. Right now I can't see a pressing reason to define a schema. In any case it would be good to parse the input (with the -xml-input option) with a proper XML parser, e.g. http://www.boost.org/doc/libs/1_54_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser There are probably better XML parsers, but Moses already requires Boost. Using an XML parser could also solve some of the character escaping uncertainty. Achim From: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] On Behalf Of supp...@precisiontranslationtools.com Sent: Tuesday, October 15, 2013 10:25 PM To: moses-support@mit.edu Subject: Re: [Moses-support] Placeholders A change from anytag/ will no-doubt disrupt existing pipelines. Communicating the change with the new release will be a great help. On 2013-10-15 01:35, Hieu Hoang wrote: they're good ideas. I'll have a think if I get round to doing it. Would also want to minimise the work I have to do, and minimize the disruption to people's existing pipeline. On 15 October 2013 01:33, Tom Hoar tah...@precisiontranslationtools.com wrote: I agree that anytag/ could cause problems, especially with the growing list of reserved tag names (ne, wall, zone). I wholeheartedly support a fixed tag, but I'm not sure option is it. What about np/ (already in the manual) or xml-markup/ or xml-input/ or moses/? Here's another idea. The -xml-input flag supports values exclusive, inclusive, ignore and pass-through. What about changing the flag to a boolean flag. Then, use the value as the xml tags: exclusive/, inclusive/ and ignore/ so the one invocation of Moses would support all modes on a per-sentence basis. Just a thought. Think this would also be easier if you dropped the pass-through option because no need for backwards compatibility. Another idea, although slightly different subject. Moses' -monotone-at-punctuation flag would be more useful if we could define/override the punctuation symbols that we want it to use. Not sure how to best accomplish this. Tom On 10/15/2013 04:07 AM, Hieu Hoang wrote: In fact, we're thinking of changing anytag/ to something fixed, like option/ The anytag/ behaviour isn't good XML and will cause problems in the future Any opinions on this gratefully received ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
The reality is that the current --xml-input functionality straddles the fence between the scheme-less and defined schema worlds. It's anytag/ except wall/ and zone/ and ne/. Moses currently supports only four functions with XML markup: specifying alternate translation, walls, zones and named entities. I'm not sure a full XML parser is necessary for four functions, but the chance of accidental conflicts grows with the number of functions. It seems more efficient to assign a tag name to the only current function that doesn't have a reserved tag name. Then, the undefined tag names become the exception that Moses ignores. Tom On 10/16/2013 11:16 PM, Achim Ruopp wrote: anytag/ is XML-compliant in schema-less XML (as long as the tag name complies to http://www.w3.org/TR/REC-xml/#NT-Name) IMHO Moses input (with the -xml-input option) should stay schema-less, or we should define a schema. Right now I can't see a pressing reason to define a schema. In any case it would be good to parse the input (with the -xml-input option) with a proper XML parser, e.g. http://www.boost.org/doc/libs/1_54_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser There are probably better XML parsers, but Moses already requires Boost. Using an XML parser could also solve some of the character escaping uncertainty. Achim *From:*moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] *On Behalf Of *supp...@precisiontranslationtools.com *Sent:* Tuesday, October 15, 2013 10:25 PM *To:* moses-support@mit.edu *Subject:* Re: [Moses-support] Placeholders A change from anytag/ will no-doubt disrupt existing pipelines. Communicating the change with the new release will be a great help. On 2013-10-15 01:35, Hieu Hoang wrote: they're good ideas. I'll have a think if I get round to doing it. Would also want to minimise the work I have to do, and minimize the disruption to people's existing pipeline. On 15 October 2013 01:33, Tom Hoar tah...@precisiontranslationtools.com mailto:tah...@precisiontranslationtools.com wrote: I agree that anytag/ could cause problems, especially with the growing list of reserved tag names (ne, wall, zone). I wholeheartedly support a fixed tag, but I'm not sure option is it. What about np/ (already in the manual) or xml-markup/ or xml-input/ or moses/? Here's another idea. The -xml-input flag supports values exclusive, inclusive, ignore and pass-through. What about changing the flag to a boolean flag. Then, use the value as the xml tags: exclusive/, inclusive/ and ignore/ so the one invocation of Moses would support all modes on a per-sentence basis. Just a thought. Think this would also be easier if you dropped the pass-through option because no need for backwards compatibility. Another idea, although slightly different subject. Moses' -monotone-at-punctuation flag would be more useful if we could define/override the punctuation symbols that we want it to use. Not sure how to best accomplish this. Tom On 10/15/2013 04:07 AM, Hieu Hoang wrote: In fact, we're thinking of changing anytag/ to something fixed, like option/ The anytag/ behaviour isn't good XML and will cause problems in the future Any opinions on this gratefully received ___ Moses-support mailing list Moses-support@mit.edu mailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu mailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
they're good ideas. I'll have a think if I get round to doing it. Would also want to minimise the work I have to do, and minimize the disruption to people's existing pipeline. On 15 October 2013 01:33, Tom Hoar tah...@precisiontranslationtools.comwrote: I agree that anytag/ could cause problems, especially with the growing list of reserved tag names (ne, wall, zone). I wholeheartedly support a fixed tag, but I'm not sure option is it. What about np/ (already in the manual) or xml-markup/ or xml-input/ or moses/? Here's another idea. The -xml-input flag supports values exclusive, inclusive, ignore and pass-through. What about changing the flag to a boolean flag. Then, use the value as the xml tags: exclusive/, inclusive/ and ignore/ so the one invocation of Moses would support all modes on a per-sentence basis. Just a thought. Think this would also be easier if you dropped the pass-through option because no need for backwards compatibility. Another idea, although slightly different subject. Moses' -monotone-at-punctuation flag would be more useful if we could define/override the punctuation symbols that we want it to use. Not sure how to best accomplish this. Tom On 10/15/2013 04:07 AM, Hieu Hoang wrote: In fact, we're thinking of changing anytag/ to something fixed, like option/ The anytag/ behaviour isn't good XML and will cause problems in the future Any opinions on this gratefully received ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
A change from anytag/ will no-doubt disrupt existing pipelines. Communicating the change with the new release will be a great help. On 2013-10-15 01:35, Hieu Hoang wrote: they're good ideas. I'll have a think if I get round to doing it. Would also want to minimise the work I have to do, and minimize the disruption to people's existing pipeline. On 15 October 2013 01:33, Tom Hoar tah...@precisiontranslationtools.com wrote: I agree that anytag/ could cause problems, especially with the growing list of reserved tag names (ne, wall, zone). I wholeheartedly support a fixed tag, but I'm not sure option is it. What about np/ (already in the manual) or xml-markup/ or xml-input/ or moses/? Here's another idea. The -xml-input flag supports values exclusive, inclusive, ignore and pass-through. What about changing the flag to a boolean flag. Then, use the value as the xml tags: exclusive/, inclusive/ and ignore/ so the one invocation of Moses would support all modes on a per-sentence basis. Just a thought. Think this would also be easier if you dropped the pass-through option because no need for backwards compatibility. Another idea, although slightly different subject. Moses' -monotone-at-punctuation flag would be more useful if we could define/override the punctuation symbols that we want it to use. Not sure how to best accomplish this. Tom On 10/15/2013 04:07 AM, Hieu Hoang wrote: In fact, we're thinking of changing anytag/ to something fixed, like option/ The anytag/ behaviour isn't good XML and will cause problems in the future Any opinions on this gratefully received ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support [1] -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu [2] ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support [1] Links: -- [1] http://mailman.mit.edu/mailman/listinfo/moses-support [2] http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
Hi tom Sent while bumping into things On 13 Oct 2013, at 17:01, Tom Hoar tah...@precisiontranslationtools.com wrote: Thanks Hieu and Achim for the new feature. I think it's great. Some questions: 1) When envoking mert-moses.pl to tune a model prepared with placeholders, and the dev set includes placeholders, it looks like the new moses command line options (-placeholder-factor 1 -xml-input exclusive) should be placed in the --decoder-flags or in the config file. Can you confirm? Yep, they are decoder flags. 2) Are there any limits as to what escape sequences are used as placeholders? Your example was @num@. Could this just as easily be %(num)s if carried through all the necessary steps? No limit on what the placeholder 'word' should be There can also be multiple, different placeholder words. @num@ for numbers, %(date) for dates, :place: for place names etc 3) If we change your example to you owe me $ 42.85 . and update the ph_numbers.perl to re-format numbers with the target language formatting you owe me $ ne translation=@num@ entity=42,85@num@/ne . would the corresponding translated output include the 42,85? Yes, 42,85 will be the output. The placeholder script should be language pair specific. There are flags to specify source an target language in the script but i don't think they used at the moment. You shoul extend it 4) If the entity= value must include reserved/special characters, such as , , , or Moses restricted vertical bar | , should they be escaped within the quotes like the tokenizer.perl and escape-special-chars.perl scripts escape them? Dunno. Haven't kicked the tyres on this yet. You should ver on the safe side and escape it. Also, since you have to I escape the whole output sentence, not escaping it may cause you problems 5) The last I recall, the --xlm-input option wasn't particular about what XML tag is used. Is this still true, the example could be anytag/ and still work the same? No, it must be ne .. In fact, we're thinking of changing anytag/ to something fixed, like option/ The anytag/ behaviour isn't good XML and will cause problems in the future Any opinions on this gratefully received 6) Any chance to backport this feature to RELEASE-1.0? How much work do you think would be involved? If we choose to do the backport, can you point us in the right direction and do you want the updates for a RELEASE-1.1? Can't add this to release 1. It depends on stuff that's only in the current github code The current code will read most ini files you create with release 1, so that should lessen your pain However, it would be good if you can move to release 2.0, it would cause less headaches for you and me. The ini file shouldn't change from what we have now in github Thanks, Tom On 10/10/2013 08:30 PM, Hieu Hoang wrote: On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote: Hi Hieu I read the documentation and you mention that you enable the exclusive mode of xml-input I see few issues: - you mention that you enable the exclusive mode of xml-input; this can conflict with other usage of xml-input which instead require the inclusive mode. do you have any comments on that? it can be exclusive, inclusive or anything else except pass-through. It just requires the XML handling to run - when you use the exclusive mode you force the translation of the span (@num@) with 100) and other larger span including @num@ are not allowed am I right? If yes, what is the advantage of having phrase pairs including other words it doesn't create XML options, it just needs the XML parsing to run. - what is the meaning of -placeholder-factor 1 ? It stores the original text in the source factor 1. The placeholder symbol is in the factor 0, or whatever the translation model was configured to use. Nicola Bertoldi On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: Hi all Achim and I have been working on adding support for placeholders into Moses. That is, replacing a number, date, or named entity with a symbol eg. @num@, -date-, =named-entity=. We think it would be especially useful for commercial users of Moses, and for people translating text with lots of numbers, dates etc. It is now supported in the Moses training and decoding pipeline. See the following URL for more details. h -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edumailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu
Re: [Moses-support] Placeholders
I agree that anytag/ could cause problems, especially with the growing list of reserved tag names (ne, wall, zone). I wholeheartedly support a fixed tag, but I'm not sure option is it. What about np/ (already in the manual) or xml-markup/ or xml-input/ or moses/? Here's another idea. The -xml-input flag supports values exclusive, inclusive, ignore and pass-through. What about changing the flag to a boolean flag. Then, use the value as the xml tags: exclusive/, inclusive/ and ignore/ so the one invocation of Moses would support all modes on a per-sentence basis. Just a thought. Think this would also be easier if you dropped the pass-through option because no need for backwards compatibility. Another idea, although slightly different subject. Moses' -monotone-at-punctuation flag would be more useful if we could define/override the punctuation symbols that we want it to use. Not sure how to best accomplish this. Tom On 10/15/2013 04:07 AM, Hieu Hoang wrote: In fact, we're thinking of changing anytag/ to something fixed, like option/ The anytag/ behaviour isn't good XML and will cause problems in the future Any opinions on this gratefully received ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
Thanks Hieu and Achim for the new feature. I think it's great. Some questions: 1) When envoking mert-moses.pl to tune a model prepared with placeholders, and the dev set includes placeholders, it looks like the new moses command line options (-placeholder-factor 1 -xml-input exclusive) should be placed in the --decoder-flags or in the config file. Can you confirm? 2) Are there any limits as to what escape sequences are used as placeholders? Your example was @num@. Could this just as easily be %(num)s if carried through all the necessary steps? 3) If we change your example to you owe me $ 42.85 . and update the ph_numbers.perl to re-format numbers with the target language formatting you owe me $ ne translation=@num@ entity=42,85@num@/ne . would the corresponding translated output include the 42,85? 4) If the entity= value must include reserved/special characters, such as , , , or Moses restricted vertical bar | , should they be escaped within the quotes like the tokenizer.perl and escape-special-chars.perl scripts escape them? 5) The last I recall, the --xlm-input option wasn't particular about what XML tag is used. Is this still true, the example could be anytag/ and still work the same? 6) Any chance to backport this feature to RELEASE-1.0? How much work do you think would be involved? If we choose to do the backport, can you point us in the right direction and do you want the updates for a RELEASE-1.1? Thanks, Tom On 10/10/2013 08:30 PM, Hieu Hoang wrote: On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu mailto:berto...@fbk.eu wrote: Hi Hieu I read the documentation and you mention that you enable the exclusive mode of xml-input I see few issues: - you mention that you enable the exclusive mode of xml-input; this can conflict with other usage of xml-input which instead require the inclusive mode. do you have any comments on that? it can be exclusive, inclusive or anything else except pass-through. It just requires the XML handling to run - when you use the exclusive mode you force the translation of the span (@num@) with 100) and other larger span including @num@ are not allowed am I right? If yes, what is the advantage of having phrase pairs including other words it doesn't create XML options, it just needs the XML parsing to run. - what is the meaning of -placeholder-factor 1 ? It stores the original text in the source factor 1. The placeholder symbol is in the factor 0, or whatever the translation model was configured to use. Nicola Bertoldi On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: Hi all Achim and I have been working on adding support for placeholders into Moses. That is, replacing a number, date, or named entity with a symbol eg. @num@, -date-, =named-entity=. We think it would be especially useful for commercial users of Moses, and for people translating text with lots of numbers, dates etc. It is now supported in the Moses training and decoding pipeline. See the following URL for more details. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60 -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu mailto:Moses-support@mit.edumailto:Moses-support@mit.edu mailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
Hi, place holders would be useful. What's the implication of it just needs the XML parsing to run? Does this mean that the option could only be used with html-input? Not with plain text? Yours, Per Tunedal On Thu, Oct 10, 2013, at 15:30, Hieu Hoang wrote: On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote: Hi Hieu I read the documentation and you mention that you enable the exclusive mode of xml-input I see few issues: - you mention that you enable the exclusive mode of xml-input; this can conflict with other usage of xml-input which instead require the inclusive mode. do you have any comments on that? it can be exclusive, inclusive or anything else except pass-through. It just requires the XML handling to run - when you use the exclusive mode you force the translation of the span (@num@) with 100) and other larger span including @num@ are not allowed am I right? If yes, what is the advantage of having phrase pairs including other words it doesn't create XML options, it just needs the XML parsing to run. - what is the meaning of -placeholder-factor 1 ? It stores the original text in the source factor 1. The placeholder symbol is in the factor 0, or whatever the translation model was configured to use. Nicola Bertoldi On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: Hi all Achim and I have been working on adding support for placeholders into Moses. That is, replacing a number, date, or named entity with a symbol eg. @num@, -date-, =named-entity=. We think it would be especially useful for commercial users of Moses, and for people translating text with lots of numbers, dates etc. It is now supported in the Moses training and decoding pipeline. See the following URL for more details. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60 -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edumailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
it doesn't process HTML. Just plain text, and XML markups to indicate the placeholder For example, your original sentence is you owe me $ 100 . After running it through the placeholder script, the sentence become you owe me $ ne translation=@num@ entity=100@num@/ne . The XML processing is needed to parse this XML On 11 October 2013 07:08, Per Tunedal per.tune...@operamail.com wrote: Hi, place holders would be useful. What's the implication of it just needs the XML parsing to run? Does this mean that the option could only be used with html-input? Not with plain text? Yours, Per Tunedal On Thu, Oct 10, 2013, at 15:30, Hieu Hoang wrote: On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote: Hi Hieu I read the documentation and you mention that you enable the exclusive mode of xml-input I see few issues: - you mention that you enable the exclusive mode of xml-input; this can conflict with other usage of xml-input which instead require the inclusive mode. do you have any comments on that? it can be exclusive, inclusive or anything else except pass-through. It just requires the XML handling to run - when you use the exclusive mode you force the translation of the span (@num@) with 100) and other larger span including @num@ are not allowed am I right? If yes, what is the advantage of having phrase pairs including other words it doesn't create XML options, it just needs the XML parsing to run. - what is the meaning of -placeholder-factor 1 ? It stores the original text in the source factor 1. The placeholder symbol is in the factor 0, or whatever the translation model was configured to use. Nicola Bertoldi On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: Hi all Achim and I have been working on adding support for placeholders into Moses. That is, replacing a number, date, or named entity with a symbol eg. @num@, -date-, =named-entity=. We think it would be especially useful for commercial users of Moses, and for people translating text with lots of numbers, dates etc. It is now supported in the Moses training and decoding pipeline. See the following URL for more details. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60 -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edumailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
Hi Hieu, Excellent! Thank you for your explanation. Yours, Per Tunedal On Fri, Oct 11, 2013, at 13:24, Hieu Hoang wrote: it doesn't process HTML. Just plain text, and XML markups to indicate the placeholder For example, your original sentence is you owe me $ 100 . After running it through the placeholder script, the sentence become you owe me $ ne translation=@num@ entity=100@num@/ne . The XML processing is needed to parse this XML On 11 October 2013 07:08, Per Tunedal [1]per.tune...@operamail.com wrote: Hi, place holders would be useful. What's the implication of it just needs the XML parsing to run? Does this mean that the option could only be used with html-input? Not with plain text? Yours, Per Tunedal On Thu, Oct 10, 2013, at 15:30, Hieu Hoang wrote: On 10 October 2013 13:33, Nicola Bertoldi [2]berto...@fbk.eu wrote: Hi Hieu I read the documentation and you mention that you enable the exclusive mode of xml-input I see few issues: - you mention that you enable the exclusive mode of xml-input; this can conflict with other usage of xml-input which instead require the inclusive mode. do you have any comments on that? it can be exclusive, inclusive or anything else except pass-through. It just requires the XML handling to run - when you use the exclusive mode you force the translation of the span (@num@) with 100) and other larger span including @num@ are not allowed am I right? If yes, what is the advantage of having phrase pairs including other words it doesn't create XML options, it just needs the XML parsing to run. - what is the meaning of -placeholder-factor 1 ? It stores the original text in the source factor 1. The placeholder symbol is in the factor 0, or whatever the translation model was configured to use. Nicola Bertoldi On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: Hi all Achim and I have been working on adding support for placeholders into Moses. That is, replacing a number, date, or named entity with a symbol eg. @num@, -date-, =named-entity=. We think it would be especially useful for commercial users of Moses, and for people translating text with lots of numbers, dates etc. It is now supported in the Moses training and decoding pipeline. See the following URL for more details. [3]http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60 -- Hieu Hoang Research Associate University of Edinburgh [4]http://www.hoang.co.uk/hieu ___ Moses-support mailing list [5]Moses-support@mit.edumailto:[6]Moses-support@mit.edu [7]http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh [8]http://www.hoang.co.uk/hieu ___ Moses-support mailing list [9]Moses-support@mit.edu [10]http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list [11]Moses-support@mit.edu [12]http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh [13]http://www.hoang.co.uk/hieu References 1. mailto:per.tune...@operamail.com 2. mailto:berto...@fbk.eu 3. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60 4. http://www.hoang.co.uk/hieu 5. mailto:Moses-support@mit.edu 6. mailto:Moses-support@mit.edu 7. http://mailman.mit.edu/mailman/listinfo/moses-support 8. http://www.hoang.co.uk/hieu 9. mailto:Moses-support@mit.edu 10. http://mailman.mit.edu/mailman/listinfo/moses-support 11. mailto:Moses-support@mit.edu 12. http://mailman.mit.edu/mailman/listinfo/moses-support 13. http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
Hi Hieu I read the documentation and you mention that you enable the exclusive mode of xml-input I see few issues: - you mention that you enable the exclusive mode of xml-input; this can conflict with other usage of xml-input which instead require the inclusive mode. do you have any comments on that? - when you use the exclusive mode you force the translation of the span (@num@) with 100) and other larger span including @num@ are not allowed am I right? If yes, what is the advantage of having phrase pairs including other words - what is the meaning of -placeholder-factor 1 ? Nicola Bertoldi On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: Hi all Achim and I have been working on adding support for placeholders into Moses. That is, replacing a number, date, or named entity with a symbol eg. @num@, -date-, =named-entity=. We think it would be especially useful for commercial users of Moses, and for people translating text with lots of numbers, dates etc. It is now supported in the Moses training and decoding pipeline. See the following URL for more details. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60 -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edumailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders
On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote: Hi Hieu I read the documentation and you mention that you enable the exclusive mode of xml-input I see few issues: - you mention that you enable the exclusive mode of xml-input; this can conflict with other usage of xml-input which instead require the inclusive mode. do you have any comments on that? it can be exclusive, inclusive or anything else except pass-through. It just requires the XML handling to run - when you use the exclusive mode you force the translation of the span (@num@) with 100) and other larger span including @num@ are not allowed am I right? If yes, what is the advantage of having phrase pairs including other words it doesn't create XML options, it just needs the XML parsing to run. - what is the meaning of -placeholder-factor 1 ? It stores the original text in the source factor 1. The placeholder symbol is in the factor 0, or whatever the translation model was configured to use. Nicola Bertoldi On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote: Hi all Achim and I have been working on adding support for placeholders into Moses. That is, replacing a number, date, or named entity with a symbol eg. @num@, -date-, =named-entity=. We think it would be especially useful for commercial users of Moses, and for people translating text with lots of numbers, dates etc. It is now supported in the Moses training and decoding pipeline. See the following URL for more details. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60 -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edumailto:Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders missed
Thanks Daniel and Tomas. I solved the issue with a PHP script as following. The solution is like Daniel's, to delete the blank space in the placeholders. The result is that the placeholder is treated as a whole in decoding process. I executed the script just before running decoding. ?php $fp = fopen(true.en, r); $fo = fopen(out_compacted.en, w); while ($line = fgets($fp)) { $line = preg_replace('/\{ \}/', '{}', $line); fputs($fo, $line); } fclose($fp); fclose($fo); Thanks, Henry On Tue, Jul 10, 2012 at 2:55 PM, Tomas Hudik thu...@moravia.com wrote: Hi Henry, This answer is coming late probably, but: We have developed small sw for placeholder translation. It is under the same license as Moses. http://code.google.com/p/m4loc/ If you want to try it - download sources (it is perl mostly, so you do not need to compile it). The input should be tmx, or xliff file (localization file formats). Be aware results won't be 100% correct. Cheers, Tomas -Original Message- From: Henry Hu [mailto:henryhu...@gmail.com] Sent: Monday, July 02, 2012 11:40 AM To: moses-support@mit.edu Subject: [Moses-support] Placeholders missed Hi guys, I'm attempting to translate English to French. First I replaced some tags with placeholders {70}. Next, decoding. Finally, restoring tags. Most placeholders {70} maintained the same in the process of decoding, like this: English: buy { 70 } and enjoy unlimited Trainings sessions . French: acheter { 70 } et amusez-vous illimitée formations sessions . However, some placeholders are incomplete, like this( missed { ): English: acheter { 70 } et amusez-vous illimitée formations sessions . French: illimitée des réunions , chaque avec jusqu' à 70 } les participants I guess I should use other placeholders. But what placeholders can be options? Thanks for any suggestion. Best regards, Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders missed
Hi Henry, This answer is coming late probably, but: We have developed small sw for placeholder translation. It is under the same license as Moses. http://code.google.com/p/m4loc/ If you want to try it - download sources (it is perl mostly, so you do not need to compile it). The input should be tmx, or xliff file (localization file formats). Be aware results won't be 100% correct. Cheers, Tomas -Original Message- From: Henry Hu [mailto:henryhu...@gmail.com] Sent: Monday, July 02, 2012 11:40 AM To: moses-support@mit.edu Subject: [Moses-support] Placeholders missed Hi guys, I'm attempting to translate English to French. First I replaced some tags with placeholders {70}. Next, decoding. Finally, restoring tags. Most placeholders {70} maintained the same in the process of decoding, like this: English: buy { 70 } and enjoy unlimited Trainings sessions . French: acheter { 70 } et amusez-vous illimitée formations sessions . However, some placeholders are incomplete, like this( missed { ): English: acheter { 70 } et amusez-vous illimitée formations sessions . French: illimitée des réunions , chaque avec jusqu' à 70 } les participants I guess I should use other placeholders. But what placeholders can be options? Thanks for any suggestion. Best regards, Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders missed
Hi Henry Either use Moses xml-input feature http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5 or choose a place holder that does not appear in your phrase table, cheers - Barry On Monday 02 July 2012 10:40:17 Henry Hu wrote: Hi guys, I'm attempting to translate English to French. First I replaced some tags with placeholders {70}. Next, decoding. Finally, restoring tags. Most placeholders {70} maintained the same in the process of decoding, like this: English: buy { 70 } and enjoy unlimited Trainings sessions . French: acheter { 70 } et amusez-vous illimitée formations sessions . However, some placeholders are incomplete, like this( missed { ): English: acheter { 70 } et amusez-vous illimitée formations sessions . French: illimitée des réunions , chaque avec jusqu' à 70 } les participants I guess I should use other placeholders. But what placeholders can be options? Thanks for any suggestion. Best regards, Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Barry Haddow University of Edinburgh +44 (0) 131 651 3173 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders missed
Henry, you might also change your tokenization so your placeholders remain one token. On Mon, 2 Jul 2012 10:50:11 +0100, Barry Haddow bhad...@staffmail.ed.ac.uk wrote: Hi Henry Either use Moses xml-input feature http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5 or choose a place holder that does not appear in your phrase table, cheers - Barry On Monday 02 July 2012 10:40:17 Henry Hu wrote: Hi guys, I'm attempting to translate English to French. First I replaced some tags with placeholders {70}. Next, decoding. Finally, restoring tags. Most placeholders {70} maintained the same in the process of decoding, like this: English: buy { 70 } and enjoy unlimited Trainings sessions . French: acheter { 70 } et amusez-vous illimitée formations sessions . However, some placeholders are incomplete, like this( missed { ): English: acheter { 70 } et amusez-vous illimitée formations sessions . French: illimitée des réunions , chaque avec jusqu' à 70 } les participants I guess I should use other placeholders. But what placeholders can be options? Thanks for any suggestion. Best regards, Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support -- Barry Haddow University of Edinburgh +44 (0) 131 651 3173 -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Placeholders missed
Hi Henry, can also try to exclude the placeholders from the tokenization process so that your example would look like this: buy {70} and enjoy unlimited Trainings sessions This worked pretty well for me. This means however that you might need to train new models. Best, Daniel -Ursprüngliche Nachricht- Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im Auftrag von Henry Hu Gesendet: 02 July 2012 11:40 An: moses-support@mit.edu Betreff: [Moses-support] Placeholders missed Hi guys, I'm attempting to translate English to French. First I replaced some tags with placeholders {70}. Next, decoding. Finally, restoring tags. Most placeholders {70} maintained the same in the process of decoding, like this: English: buy { 70 } and enjoy unlimited Trainings sessions . French: acheter { 70 } et amusez-vous illimitée formations sessions . However, some placeholders are incomplete, like this( missed { ): English: acheter { 70 } et amusez-vous illimitée formations sessions . French: illimitée des réunions , chaque avec jusqu' à 70 } les participants I guess I should use other placeholders. But what placeholders can be options? Thanks for any suggestion. Best regards, Henry ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support