Re: [Moses-support] placeholders for numbers - extract step

2014-11-19 Thread Hieu Hoang
hi vito

On 18 November 2014 11:30, Vito Mandorino vito.mandor...@linguacustodia.com
 wrote:

 Hello everyone,

 I am trying to use placeholders for numbers in phrase-based MT, according
 to http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc75

 The above page says

 ---

  During extraction, add the following to the extract command (phrase-based
 only for now):

 ./extract --Placeholders @num@ 

 --

 Does this mean that I have to first run train-model.perl with
 --last-step=4, then the line above and then again train-model.perl with
 --first-step=6?

when you run train-model.perl,  add the argument
   -extract-options '--Placeholders @num@'
You can see it in this script that the EMS creates

http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3


 If this is the case, which arguments and options should I pass to extract
 for a baseline training? I think the syntax is something like

The script will then call extract with the following argument
   --Placeholders @num@
You can see it in the STDERR file of the above script

http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3.STDERR


  syntax: extract en de align extract max-length [orientation [ --model
 [wbe|phrase|hier]-[msd|mslr|mono] ] | --OnlyOutputSpanInfo | --NoTTable |
 --GZOutput | --IncludeSentenceId | --SentenceOffset n | --InstanceWeights
 filename ]

 In particular I cannot figure out what should be passed as 'align' and
 'extract' arguments.


 Regards,

 Vito

  --

 *M**. Vito MANDORINO -- Chief Scientist*


 [image: Description : Description : lingua_custodia_final full logo]

  *The Translation Trustee*

 *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

 *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
 %2B33%206%2084%2065%2068%2089*

 *Email :*  *vito.mandor...@linguacustodia.com
 massinissa.ah...@linguacustodia.com*

 *Website :*  *www.linguacustodia.com http://www.linguacustodia.com/ -
 www.thetranslationtrustee.com  http://www.thetranslationtrustee.com/*

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] placeholders for numbers - extract step

2014-11-19 Thread Vito Mandorino
Thank you Hieu, that worked very well. I am now tackling the decoding part
and I have two questions.


1) Sometimes, I get the following error message during decoding:

terminate called after throwing an instance of 'util::Exception'
  what():  moses-cmd/IOWrapper.cpp:213 in std::maplong unsigned int,
const Moses::Factor* MosesCmd::GetPlaceholders(const
Moses::Hypothesis, Moses::FactorType) threw util::Exception because
`targetPos.size() != 1'.
Placeholder should be aligned to 1, and only 1, word
Aborted

I don't understand why. I checked the phrase-table and I didn't find
phrase pairs where the '@num@' token is aligned to 2 or more words.

2) This may be related to the first question. If I run the decoder to
translate the input using the suggested command

./moses  -placeholder-factor 1 -xml-input exclusive

I get the '@num@' string in the output and not the expected number. I
do get the number if I use the option '-placeholder-factor 0'. The
model that I am using is a phrase-based, non-factored model.


Vito




2014-11-19 10:32 GMT+01:00 Hieu Hoang hieu.ho...@ed.ac.uk:

 hi vito

 On 18 November 2014 11:30, Vito Mandorino 
 vito.mandor...@linguacustodia.com wrote:

 Hello everyone,

 I am trying to use placeholders for numbers in phrase-based MT, according
 to http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc75

 The above page says

 ---

  During extraction, add the following to the extract command
 (phrase-based only for now):

 ./extract --Placeholders @num@ 

 --

 Does this mean that I have to first run train-model.perl with
 --last-step=4, then the line above and then again train-model.perl with
 --first-step=6?

 when you run train-model.perl,  add the argument
-extract-options '--Placeholders @num@'
 You can see it in this script that the EMS creates

 http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3


 If this is the case, which arguments and options should I pass to extract
 for a baseline training? I think the syntax is something like

 The script will then call extract with the following argument
--Placeholders @num@
 You can see it in the STDERR file of the above script

 http://www.statmt.org/moses/RELEASE-2.1/models/cs-en/steps/3/TRAINING_extract-phrases.3.STDERR


  syntax: extract en de align extract max-length [orientation [ --model
 [wbe|phrase|hier]-[msd|mslr|mono] ] | --OnlyOutputSpanInfo | --NoTTable |
 --GZOutput | --IncludeSentenceId | --SentenceOffset n | --InstanceWeights
 filename ]

 In particular I cannot figure out what should be passed as 'align' and
 'extract' arguments.


 Regards,

 Vito

  --

 *M**. Vito MANDORINO -- Chief Scientist*


 [image: Description : Description : lingua_custodia_final full logo]

  *The Translation Trustee*

 *1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

 *Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89
 %2B33%206%2084%2065%2068%2089*

 *Email :*  *vito.mandor...@linguacustodia.com
 massinissa.ah...@linguacustodia.com*

 *Website :*  *www.linguacustodia.com http://www.linguacustodia.com/ -
 www.thetranslationtrustee.com  http://www.thetranslationtrustee.com/*

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




 --
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu




-- 
*M**. Vito MANDORINO -- Chief Scientist*


[image: Description : Description : lingua_custodia_final full logo]

 *The Translation Trustee*

*1, Place Charles de Gaulle, **78180 Montigny-le-Bretonneux*

*Tel : +33 1 30 44 04 23   Mobile : +33 6 84 65 68 89*

*Email :*  *vito.mandor...@linguacustodia.com
massinissa.ah...@linguacustodia.com*

*Website :*  *www.linguacustodia.com http://www.linguacustodia.com/ -
www.thetranslationtrustee.com  http://www.thetranslationtrustee.com/*
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-16 Thread Achim Ruopp
anytag/  is XML-compliant in schema-less XML (as long as the tag name 
complies to http://www.w3.org/TR/REC-xml/#NT-Name)

 

IMHO Moses input (with the -xml-input option) should stay schema-less, or we 
should define a schema. Right now I can't see a pressing reason to define a 
schema.

 

In any case it would be good to parse the input (with the -xml-input option) 
with a proper XML parser, e.g.

http://www.boost.org/doc/libs/1_54_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser
 

There are probably better XML parsers, but Moses already requires Boost. Using 
an XML parser could also solve some of the character escaping uncertainty.

 

Achim 

 

From: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] On 
Behalf Of supp...@precisiontranslationtools.com
Sent: Tuesday, October 15, 2013 10:25 PM
To: moses-support@mit.edu
Subject: Re: [Moses-support] Placeholders

 

A change from anytag/ will no-doubt disrupt existing pipelines. Communicating 
the change with the new release will be a great help.

 

On 2013-10-15 01:35, Hieu Hoang wrote:

they're good ideas. I'll have a think if I get round to doing it. 

Would also want to minimise the work I have to do, and minimize the disruption 
to people's existing pipeline.

 

On 15 October 2013 01:33, Tom Hoar tah...@precisiontranslationtools.com wrote:

I agree that anytag/ could cause problems, especially with the growing
list of reserved tag names (ne, wall, zone). I wholeheartedly support a
fixed tag, but I'm not sure option is it. What about np/ (already in
the manual) or xml-markup/ or xml-input/ or moses/?

Here's another idea. The -xml-input flag supports values exclusive,
inclusive, ignore and pass-through. What about changing the flag
to a boolean flag. Then, use the value as the xml tags: exclusive/,
inclusive/ and ignore/ so the one invocation of Moses would support
all modes on a per-sentence basis. Just a thought. Think this would also
be easier if you dropped the pass-through option because no need for
backwards compatibility.

Another idea, although slightly different subject. Moses'
-monotone-at-punctuation flag would be more useful if we could
define/override the punctuation  symbols that we want it to use. Not
sure how to best accomplish this.

Tom




On 10/15/2013 04:07 AM, Hieu Hoang wrote:
 In fact, we're thinking of changing anytag/ to something fixed, like
 option/

 The anytag/ behaviour isn't good XML and will cause problems in the
 future

 Any opinions on this gratefully received


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

 

 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-16 Thread Tom Hoar
The reality is that the current --xml-input functionality straddles the 
fence between the scheme-less and defined schema worlds. It's anytag/ 
except wall/ and zone/ and ne/. Moses currently supports only 
four functions with XML markup: specifying alternate translation, walls, 
zones and named entities. I'm not sure a full XML parser is necessary 
for four functions, but the chance of accidental conflicts grows with 
the number of functions.


It seems more efficient to assign a tag name to the only current 
function that doesn't have a reserved tag name. Then, the undefined tag 
names become the exception that Moses ignores.


Tom


On 10/16/2013 11:16 PM, Achim Ruopp wrote:


anytag/  is XML-compliant in schema-less XML (as long as the tag 
name complies to http://www.w3.org/TR/REC-xml/#NT-Name)


IMHO Moses input (with the -xml-input option) should stay schema-less, 
or we should define a schema. Right now I can't see a pressing reason 
to define a schema.


In any case it would be good to parse the input (with the -xml-input 
option) with a proper XML parser, e.g.


http://www.boost.org/doc/libs/1_54_0/doc/html/boost_propertytree/parsers.html#boost_propertytree.parsers.xml_parser 



There are probably better XML parsers, but Moses already requires 
Boost. Using an XML parser could also solve some of the character 
escaping uncertainty.


Achim

*From:*moses-support-boun...@mit.edu 
[mailto:moses-support-boun...@mit.edu] *On Behalf Of 
*supp...@precisiontranslationtools.com

*Sent:* Tuesday, October 15, 2013 10:25 PM
*To:* moses-support@mit.edu
*Subject:* Re: [Moses-support] Placeholders

A change from anytag/ will no-doubt disrupt existing pipelines. 
Communicating the change with the new release will be a great help.


On 2013-10-15 01:35, Hieu Hoang wrote:

they're good ideas. I'll have a think if I get round to doing it.

Would also want to minimise the work I have to do, and minimize
the disruption to people's existing pipeline.

On 15 October 2013 01:33, Tom Hoar
tah...@precisiontranslationtools.com
mailto:tah...@precisiontranslationtools.com wrote:

I agree that anytag/ could cause problems, especially with the
growing
list of reserved tag names (ne, wall, zone). I wholeheartedly
support a
fixed tag, but I'm not sure option is it. What about np/
(already in
the manual) or xml-markup/ or xml-input/ or moses/?

Here's another idea. The -xml-input flag supports values exclusive,
inclusive, ignore and pass-through. What about changing the flag
to a boolean flag. Then, use the value as the xml tags: exclusive/,
inclusive/ and ignore/ so the one invocation of Moses would
support
all modes on a per-sentence basis. Just a thought. Think this
would also
be easier if you dropped the pass-through option because no need for
backwards compatibility.

Another idea, although slightly different subject. Moses'
-monotone-at-punctuation flag would be more useful if we could
define/override the punctuation  symbols that we want it to use. Not
sure how to best accomplish this.

Tom




On 10/15/2013 04:07 AM, Hieu Hoang wrote:
 In fact, we're thinking of changing anytag/ to something
fixed, like
 option/

 The anytag/ behaviour isn't good XML and will cause problems
in the
 future

 Any opinions on this gratefully received


___
Moses-support mailing list
Moses-support@mit.edu mailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
Hieu Hoang

Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

___

Moses-support mailing list

Moses-support@mit.edu  mailto:Moses-support@mit.edu

http://mailman.mit.edu/mailman/listinfo/moses-support



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-15 Thread Hieu Hoang
they're good ideas. I'll have a think if I get round to doing it.

Would also want to minimise the work I have to do, and minimize the
disruption to people's existing pipeline.


On 15 October 2013 01:33, Tom Hoar tah...@precisiontranslationtools.comwrote:

 I agree that anytag/ could cause problems, especially with the growing
 list of reserved tag names (ne, wall, zone). I wholeheartedly support a
 fixed tag, but I'm not sure option is it. What about np/ (already in
 the manual) or xml-markup/ or xml-input/ or moses/?

 Here's another idea. The -xml-input flag supports values exclusive,
 inclusive, ignore and pass-through. What about changing the flag
 to a boolean flag. Then, use the value as the xml tags: exclusive/,
 inclusive/ and ignore/ so the one invocation of Moses would support
 all modes on a per-sentence basis. Just a thought. Think this would also
 be easier if you dropped the pass-through option because no need for
 backwards compatibility.

 Another idea, although slightly different subject. Moses'
 -monotone-at-punctuation flag would be more useful if we could
 define/override the punctuation  symbols that we want it to use. Not
 sure how to best accomplish this.

 Tom



 On 10/15/2013 04:07 AM, Hieu Hoang wrote:
  In fact, we're thinking of changing anytag/ to something fixed, like
  option/
 
  The anytag/ behaviour isn't good XML and will cause problems in the
  future
 
  Any opinions on this gratefully received
 

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-15 Thread support
 

A change from anytag/ will no-doubt disrupt existing pipelines.
Communicating the change with the new release will be a great help. 

On
2013-10-15 01:35, Hieu Hoang wrote: 

 they're good ideas. I'll have a
think if I get round to doing it. 
 
 Would also want to minimise the
work I have to do, and minimize the disruption to people's existing
pipeline.
 
 On 15 October 2013 01:33, Tom Hoar
tah...@precisiontranslationtools.com wrote:
 
 I agree that
anytag/ could cause problems, especially with the growing
 list of
reserved tag names (ne, wall, zone). I wholeheartedly support a
 fixed
tag, but I'm not sure option is it. What about np/ (already in

the manual) or xml-markup/ or xml-input/ or moses/?
 
 Here's
another idea. The -xml-input flag supports values exclusive,

inclusive, ignore and pass-through. What about changing the
flag
 to a boolean flag. Then, use the value as the xml tags:
exclusive/,
 inclusive/ and ignore/ so the one invocation of
Moses would support
 all modes on a per-sentence basis. Just a
thought. Think this would also
 be easier if you dropped the
pass-through option because no need for
 backwards compatibility.


 Another idea, although slightly different subject. Moses'

-monotone-at-punctuation flag would be more useful if we could

define/override the punctuation  symbols that we want it to use. Not

sure how to best accomplish this.
 
 Tom
 
 On 10/15/2013 04:07
AM, Hieu Hoang wrote:
  In fact, we're thinking of changing anytag/
to something fixed, like
  option/
 
  The anytag/
behaviour isn't good XML and will cause problems in the
  future


  Any opinions on this gratefully received
 
 

___
 Moses-support mailing
list
 Moses-support@mit.edu

http://mailman.mit.edu/mailman/listinfo/moses-support [1]
 
 -- 

Hieu Hoang
 Research Associate
 University of Edinburgh

http://www.hoang.co.uk/hieu [2]
 

___
 Moses-support mailing
list
 Moses-support@mit.edu

http://mailman.mit.edu/mailman/listinfo/moses-support [1]




Links:
--
[1]
http://mailman.mit.edu/mailman/listinfo/moses-support
[2]
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-14 Thread Hieu Hoang
Hi tom

Sent while bumping into things

 On 13 Oct 2013, at 17:01, Tom Hoar tah...@precisiontranslationtools.com 
 wrote:
 
 Thanks Hieu and Achim for the new feature. I think it's great. Some questions:
 
 1) When envoking mert-moses.pl to tune a model prepared with placeholders, 
 and the dev set includes placeholders, it looks like the new moses command 
 line options (-placeholder-factor 1 -xml-input exclusive) should be placed in 
 the --decoder-flags or in the config file. Can you confirm?
Yep, they are decoder flags.
 
 2) Are there any limits as to what escape sequences are used as placeholders? 
 Your example was @num@. Could this just as easily be %(num)s if carried 
 through all the necessary steps? 
No limit on what the placeholder 'word' should be

There can also be multiple, different placeholder words. @num@ for numbers, 
%(date) for dates, :place: for place names etc
 
 3) If we change your example to
 
you owe me $ 42.85 .
 
 and update the ph_numbers.perl to re-format numbers with the target language 
 formatting
 
you owe me $ ne translation=@num@ entity=42,85@num@/ne .
 
 would the corresponding translated output include the 42,85?
Yes, 42,85 will be the output. 

The placeholder script should be language pair specific. There are flags to 
specify source an target language in the script but i don't think they used at 
the moment. You shoul extend it
 
 4) If the entity= value must include reserved/special characters, such as 
 , , , or Moses restricted vertical bar | , should they be escaped within 
 the quotes like the tokenizer.perl and escape-special-chars.perl scripts 
 escape them?
Dunno. Haven't kicked the tyres on this yet. 

You should ver on the safe side and escape it. Also, since you have to I escape 
the whole output sentence, not escaping it may cause you problems
 
 5) The last I recall, the --xlm-input option wasn't particular about what XML 
 tag is used. Is this still true, the example could be anytag/ and still 
 work the same?

No, it must be ne ..

In fact, we're thinking of changing anytag/ to something fixed, like 
option/ 

The anytag/ behaviour isn't good XML and will cause problems in the future

Any opinions on this gratefully received 

 
 6) Any chance to backport this feature to RELEASE-1.0? How much work do you 
 think would be involved? If we choose to do the backport, can you point us in 
 the right direction and do you want the updates for a RELEASE-1.1?
Can't add this to release 1. It depends on stuff that's only in the current 
github code 

The current code will read most ini files you create with release 1, so that 
should lessen your pain

However, it would be good if you can move to release 2.0, it would cause less 
headaches for you and me. The ini file shouldn't change from what we have now 
in github
 
 Thanks,
 Tom
 
 
 
 
 On 10/10/2013 08:30 PM, Hieu Hoang wrote:
 
 
 
 On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote:
 Hi Hieu
 
 I read the documentation
 and you mention that you enable the exclusive mode of xml-input
 
 I see few issues:
 
 - you mention that you enable the exclusive mode of xml-input;
   this can conflict with other usage of xml-input which instead require the 
  inclusive mode.
   do you have any comments on that?
 
 it can be exclusive, inclusive or anything else except pass-through. It just 
 requires the XML handling to run
  
 
 - when you use the exclusive mode you force the translation of the span 
 (@num@) with 100)
   and other larger span including @num@ are not allowed
   am I right?
   If yes, what is the advantage of having phrase pairs including other words
 
 it doesn't create XML options, it just needs the XML parsing to run.
  
 
 - what is the meaning of  -placeholder-factor 1 ?
 It stores the original text in the source factor 1. The placeholder symbol 
 is in the factor 0, or whatever the translation model was configured to use.
  
 
 
 Nicola Bertoldi
 
 
 
 
 On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:
 
 Hi all
 
 Achim and I have been working on adding support for placeholders into 
 Moses. That is, replacing a number, date, or named entity with a symbol eg. 
 @num@, -date-, =named-entity=. We think it would be especially useful for 
 commercial users of Moses, and for people translating text with lots of 
 numbers, dates etc.
 
 It is now supported in the Moses training and decoding pipeline. See the 
 following URL  for more details.
h
 
 --
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu
 
 ___
 Moses-support mailing list
 Moses-support@mit.edumailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
 
 -- 
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu
 
 
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 

Re: [Moses-support] Placeholders

2013-10-14 Thread Tom Hoar
I agree that anytag/ could cause problems, especially with the growing 
list of reserved tag names (ne, wall, zone). I wholeheartedly support a 
fixed tag, but I'm not sure option is it. What about np/ (already in 
the manual) or xml-markup/ or xml-input/ or moses/?

Here's another idea. The -xml-input flag supports values exclusive, 
inclusive, ignore and pass-through. What about changing the flag 
to a boolean flag. Then, use the value as the xml tags: exclusive/, 
inclusive/ and ignore/ so the one invocation of Moses would support 
all modes on a per-sentence basis. Just a thought. Think this would also 
be easier if you dropped the pass-through option because no need for 
backwards compatibility.

Another idea, although slightly different subject. Moses' 
-monotone-at-punctuation flag would be more useful if we could 
define/override the punctuation  symbols that we want it to use. Not 
sure how to best accomplish this.

Tom



On 10/15/2013 04:07 AM, Hieu Hoang wrote:
 In fact, we're thinking of changing anytag/ to something fixed, like 
 option/

 The anytag/ behaviour isn't good XML and will cause problems in the 
 future

 Any opinions on this gratefully received


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-13 Thread Tom Hoar
Thanks Hieu and Achim for the new feature. I think it's great. Some 
questions:


1) When envoking mert-moses.pl to tune a model prepared with 
placeholders, and the dev set includes placeholders, it looks like the 
new moses command line options (-placeholder-factor 1 -xml-input 
exclusive) should be placed in the --decoder-flags or in the config 
file. Can you confirm?


2) Are there any limits as to what escape sequences are used as 
placeholders? Your example was @num@. Could this just as easily be 
%(num)s if carried through all the necessary steps?


3) If we change your example to

   you owe me $ 42.85 .

and update the ph_numbers.perl to re-format numbers with the target 
language formatting


   you owe me $ ne translation=@num@ entity=42,85@num@/ne .

would the corresponding translated output include the 42,85?

4) If the entity= value must include reserved/special characters, such 
as , , , or Moses restricted vertical bar | , should they be escaped 
within the quotes like the tokenizer.perl and escape-special-chars.perl 
scripts escape them?


5) The last I recall, the --xlm-input option wasn't particular about 
what XML tag is used. Is this still true, the example could be anytag/ 
and still work the same?


6) Any chance to backport this feature to RELEASE-1.0? How much work do 
you think would be involved? If we choose to do the backport, can you 
point us in the right direction and do you want the updates for a 
RELEASE-1.1?


Thanks,
Tom




On 10/10/2013 08:30 PM, Hieu Hoang wrote:




On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu 
mailto:berto...@fbk.eu wrote:


Hi Hieu

I read the documentation
and you mention that you enable the exclusive mode of xml-input

I see few issues:

- you mention that you enable the exclusive mode of xml-input;
  this can conflict with other usage of xml-input which instead
require the  inclusive mode.
  do you have any comments on that?


it can be exclusive, inclusive or anything else except pass-through. 
It just requires the XML handling to run



- when you use the exclusive mode you force the translation of the
span (@num@) with 100)
  and other larger span including @num@ are not allowed
  am I right?
  If yes, what is the advantage of having phrase pairs including
other words


it doesn't create XML options, it just needs the XML parsing to run.


- what is the meaning of  -placeholder-factor 1 ?

It stores the original text in the source factor 1. The placeholder 
symbol is in the factor 0, or whatever the translation model was 
configured to use.




Nicola Bertoldi




On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:

Hi all

Achim and I have been working on adding support for placeholders
into Moses. That is, replacing a number, date, or named entity
with a symbol eg. @num@, -date-, =named-entity=. We think it would
be especially useful for commercial users of Moses, and for people
translating text with lots of numbers, dates etc.

It is now supported in the Moses training and decoding pipeline.
See the following URL  for more details.
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edu
mailto:Moses-support@mit.edumailto:Moses-support@mit.edu
mailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support





--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-11 Thread Per Tunedal
Hi,
place holders would be useful. What's the implication of it just needs
the XML parsing to run?
Does this mean that the option could only be used with html-input? Not
with plain text?
Yours,
Per Tunedal

On Thu, Oct 10, 2013, at 15:30, Hieu Hoang wrote:
 On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote:
 
  Hi Hieu
 
  I read the documentation
  and you mention that you enable the exclusive mode of xml-input
 
  I see few issues:
 
  - you mention that you enable the exclusive mode of xml-input;
this can conflict with other usage of xml-input which instead require
  the  inclusive mode.
do you have any comments on that?
 
 
 it can be exclusive, inclusive or anything else except pass-through. It
 just requires the XML handling to run
 
 
 
  - when you use the exclusive mode you force the translation of the span
  (@num@) with 100)
and other larger span including @num@ are not allowed
am I right?
If yes, what is the advantage of having phrase pairs including other
  words
 
 
 it doesn't create XML options, it just needs the XML parsing to run.
 
 
 
  - what is the meaning of  -placeholder-factor 1 ?
 
 It stores the original text in the source factor 1. The placeholder
 symbol
 is in the factor 0, or whatever the translation model was configured to
 use.
 
 
 
 
  Nicola Bertoldi
 
 
 
 
  On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:
 
  Hi all
 
  Achim and I have been working on adding support for placeholders into
  Moses. That is, replacing a number, date, or named entity with a symbol eg.
  @num@, -date-, =named-entity=. We think it would be especially useful for
  commercial users of Moses, and for people translating text with lots of
  numbers, dates etc.
 
  It is now supported in the Moses training and decoding pipeline. See the
  following URL  for more details.
 http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60
 
  --
  Hieu Hoang
  Research Associate
  University of Edinburgh
  http://www.hoang.co.uk/hieu
 
  ___
  Moses-support mailing list
  Moses-support@mit.edumailto:Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
 
 
 
 -- 
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-11 Thread Hieu Hoang
it doesn't process HTML. Just plain text, and XML markups to indicate the
placeholder

For example, your original sentence is
   you owe me $ 100 .
After running it through the placeholder script, the sentence become
   you owe me $ ne translation=@num@ entity=100@num@/ne .

The XML processing is needed to parse this XML



On 11 October 2013 07:08, Per Tunedal per.tune...@operamail.com wrote:

 Hi,
 place holders would be useful. What's the implication of it just needs
 the XML parsing to run?
 Does this mean that the option could only be used with html-input? Not
 with plain text?
 Yours,
 Per Tunedal

 On Thu, Oct 10, 2013, at 15:30, Hieu Hoang wrote:
  On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote:
 
   Hi Hieu
  
   I read the documentation
   and you mention that you enable the exclusive mode of xml-input
  
   I see few issues:
  
   - you mention that you enable the exclusive mode of xml-input;
 this can conflict with other usage of xml-input which instead require
   the  inclusive mode.
 do you have any comments on that?
  
 
  it can be exclusive, inclusive or anything else except pass-through. It
  just requires the XML handling to run
 
 
  
   - when you use the exclusive mode you force the translation of the span
   (@num@) with 100)
 and other larger span including @num@ are not allowed
 am I right?
 If yes, what is the advantage of having phrase pairs including other
   words
  
 
  it doesn't create XML options, it just needs the XML parsing to run.
 
 
  
   - what is the meaning of  -placeholder-factor 1 ?
  
  It stores the original text in the source factor 1. The placeholder
  symbol
  is in the factor 0, or whatever the translation model was configured to
  use.
 
 
  
  
   Nicola Bertoldi
  
  
  
  
   On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:
  
   Hi all
  
   Achim and I have been working on adding support for placeholders into
   Moses. That is, replacing a number, date, or named entity with a
 symbol eg.
   @num@, -date-, =named-entity=. We think it would be especially useful
 for
   commercial users of Moses, and for people translating text with lots of
   numbers, dates etc.
  
   It is now supported in the Moses training and decoding pipeline. See
 the
   following URL  for more details.
  http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60
  
   --
   Hieu Hoang
   Research Associate
   University of Edinburgh
   http://www.hoang.co.uk/hieu
  
   ___
   Moses-support mailing list
   Moses-support@mit.edumailto:Moses-support@mit.edu
   http://mailman.mit.edu/mailman/listinfo/moses-support
  
  
  
 
 
  --
  Hieu Hoang
  Research Associate
  University of Edinburgh
  http://www.hoang.co.uk/hieu
  ___
  Moses-support mailing list
  Moses-support@mit.edu
  http://mailman.mit.edu/mailman/listinfo/moses-support
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-11 Thread Per Tunedal
Hi Hieu,
Excellent! Thank you for your explanation.
Yours,
Per Tunedal

On Fri, Oct 11, 2013, at 13:24, Hieu Hoang wrote:

it doesn't process HTML. Just plain text, and XML markups to indicate
the placeholder

For example, your original sentence is
   you owe me $ 100 .
After running it through the placeholder script, the sentence become
   you owe me $ ne translation=@num@ entity=100@num@/ne .

The XML processing is needed to parse this XML


On 11 October 2013 07:08, Per Tunedal [1]per.tune...@operamail.com
wrote:

  Hi,
  place holders would be useful. What's the implication of it just
  needs
  the XML parsing to run?
  Does this mean that the option could only be used with html-input?
  Not
  with plain text?
  Yours,
  Per Tunedal

On Thu, Oct 10, 2013, at 15:30, Hieu Hoang wrote:
 On 10 October 2013 13:33, Nicola Bertoldi [2]berto...@fbk.eu wrote:

  Hi Hieu
 
  I read the documentation
  and you mention that you enable the exclusive mode of xml-input
 
  I see few issues:
 
  - you mention that you enable the exclusive mode of xml-input;
this can conflict with other usage of xml-input which instead
require
  the  inclusive mode.
do you have any comments on that?
 

 it can be exclusive, inclusive or anything else except pass-through.
It
 just requires the XML handling to run


 
  - when you use the exclusive mode you force the translation of the
span
  (@num@) with 100)
and other larger span including @num@ are not allowed
am I right?
If yes, what is the advantage of having phrase pairs including
other
  words
 

 it doesn't create XML options, it just needs the XML parsing to run.


 
  - what is the meaning of  -placeholder-factor 1 ?
 
 It stores the original text in the source factor 1. The placeholder
 symbol
 is in the factor 0, or whatever the translation model was configured
to
 use.


 
 
  Nicola Bertoldi
 
 
 
 
  On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:
 
  Hi all
 
  Achim and I have been working on adding support for placeholders
into
  Moses. That is, replacing a number, date, or named entity with a
symbol eg.
  @num@, -date-, =named-entity=. We think it would be especially
useful for
  commercial users of Moses, and for people translating text with
lots of
  numbers, dates etc.
 
  It is now supported in the Moses training and decoding pipeline.
See the
  following URL  for more details.
 [3]http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60
 
  --
  Hieu Hoang
  Research Associate
  University of Edinburgh
  [4]http://www.hoang.co.uk/hieu
 
  ___
  Moses-support mailing list
  [5]Moses-support@mit.edumailto:[6]Moses-support@mit.edu
  [7]http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
 


 --
 Hieu Hoang
 Research Associate
 University of Edinburgh
 [8]http://www.hoang.co.uk/hieu
 ___
 Moses-support mailing list
 [9]Moses-support@mit.edu

   [10]http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
[11]Moses-support@mit.edu

  [12]http://mailman.mit.edu/mailman/listinfo/moses-support

--
Hieu Hoang
Research Associate
University of Edinburgh
[13]http://www.hoang.co.uk/hieu

References

1. mailto:per.tune...@operamail.com
2. mailto:berto...@fbk.eu
3. http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60
4. http://www.hoang.co.uk/hieu
5. mailto:Moses-support@mit.edu
6. mailto:Moses-support@mit.edu
7. http://mailman.mit.edu/mailman/listinfo/moses-support
8. http://www.hoang.co.uk/hieu
9. mailto:Moses-support@mit.edu
  10. http://mailman.mit.edu/mailman/listinfo/moses-support
  11. mailto:Moses-support@mit.edu
  12. http://mailman.mit.edu/mailman/listinfo/moses-support
  13. http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-10 Thread Nicola Bertoldi
Hi Hieu

I read the documentation
and you mention that you enable the exclusive mode of xml-input

I see few issues:

- you mention that you enable the exclusive mode of xml-input;
  this can conflict with other usage of xml-input which instead require the  
inclusive mode.
  do you have any comments on that?

- when you use the exclusive mode you force the translation of the span (@num@) 
with 100)
  and other larger span including @num@ are not allowed
  am I right?
  If yes, what is the advantage of having phrase pairs including other words

- what is the meaning of  -placeholder-factor 1 ?


Nicola Bertoldi




On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:

Hi all

Achim and I have been working on adding support for placeholders into Moses. 
That is, replacing a number, date, or named entity with a symbol eg. @num@, 
-date-, =named-entity=. We think it would be especially useful for commercial 
users of Moses, and for people translating text with lots of numbers, dates etc.

It is now supported in the Moses training and decoding pipeline. See the 
following URL  for more details.
   http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60

--
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu

___
Moses-support mailing list
Moses-support@mit.edumailto:Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders

2013-10-10 Thread Hieu Hoang
On 10 October 2013 13:33, Nicola Bertoldi berto...@fbk.eu wrote:

 Hi Hieu

 I read the documentation
 and you mention that you enable the exclusive mode of xml-input

 I see few issues:

 - you mention that you enable the exclusive mode of xml-input;
   this can conflict with other usage of xml-input which instead require
 the  inclusive mode.
   do you have any comments on that?


it can be exclusive, inclusive or anything else except pass-through. It
just requires the XML handling to run



 - when you use the exclusive mode you force the translation of the span
 (@num@) with 100)
   and other larger span including @num@ are not allowed
   am I right?
   If yes, what is the advantage of having phrase pairs including other
 words


it doesn't create XML options, it just needs the XML parsing to run.



 - what is the meaning of  -placeholder-factor 1 ?

It stores the original text in the source factor 1. The placeholder symbol
is in the factor 0, or whatever the translation model was configured to use.




 Nicola Bertoldi




 On Oct 10, 2013, at 1:05 PM, Hieu Hoang wrote:

 Hi all

 Achim and I have been working on adding support for placeholders into
 Moses. That is, replacing a number, date, or named entity with a symbol eg.
 @num@, -date-, =named-entity=. We think it would be especially useful for
 commercial users of Moses, and for people translating text with lots of
 numbers, dates etc.

 It is now supported in the Moses training and decoding pipeline. See the
 following URL  for more details.
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc60

 --
 Hieu Hoang
 Research Associate
 University of Edinburgh
 http://www.hoang.co.uk/hieu

 ___
 Moses-support mailing list
 Moses-support@mit.edumailto:Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support





-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders missed

2012-07-31 Thread Henry Hu
Thanks Daniel and Tomas.

I solved the issue with a PHP script as following. The solution is
like Daniel's, to delete the blank space in the placeholders. The
result is that the placeholder is treated as a whole in decoding
process. I executed the script just before running decoding.

?php

$fp = fopen(true.en, r);
$fo = fopen(out_compacted.en, w);

while ($line = fgets($fp)) {
  $line = preg_replace('/\{ \}/', '{}', $line);
  fputs($fo, $line);
}

fclose($fp);
fclose($fo);


Thanks,
Henry


On Tue, Jul 10, 2012 at 2:55 PM, Tomas Hudik thu...@moravia.com wrote:
 Hi Henry,
 This answer is coming late probably, but:

 We have developed small sw for placeholder translation.
 It is under the same license as Moses.
 http://code.google.com/p/m4loc/
 If you want to try it - download sources (it is perl mostly, so you do not 
 need to compile it). The input should be tmx, or xliff file (localization 
 file formats).
 Be aware results won't be 100% correct.


 Cheers, Tomas


 -Original Message-
 From: Henry Hu [mailto:henryhu...@gmail.com]
 Sent: Monday, July 02, 2012 11:40 AM
 To: moses-support@mit.edu
 Subject: [Moses-support] Placeholders missed

 Hi guys,

 I'm attempting to translate English to French. First I replaced some tags 
 with placeholders {70}. Next, decoding. Finally, restoring tags.
 Most placeholders {70} maintained the same in the process of decoding, like 
 this:

 English: buy { 70 } and enjoy unlimited Trainings sessions .
 French:  acheter { 70 } et amusez-vous illimitée formations sessions .

 However, some placeholders are incomplete, like this( missed { ):

 English: acheter { 70 } et amusez-vous illimitée formations sessions .
 French:  illimitée des réunions , chaque avec jusqu' à 70 } les participants

 I guess I should use other placeholders. But what placeholders can be 
 options? Thanks for any suggestion.

 Best regards,
 Henry



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders missed

2012-07-10 Thread Tomas Hudik
Hi Henry,
This answer is coming late probably, but:

We have developed small sw for placeholder translation.
It is under the same license as Moses.
http://code.google.com/p/m4loc/
If you want to try it - download sources (it is perl mostly, so you do not need 
to compile it). The input should be tmx, or xliff file (localization file 
formats). 
Be aware results won't be 100% correct.


Cheers, Tomas


-Original Message-
From: Henry Hu [mailto:henryhu...@gmail.com] 
Sent: Monday, July 02, 2012 11:40 AM
To: moses-support@mit.edu
Subject: [Moses-support] Placeholders missed

Hi guys,

I'm attempting to translate English to French. First I replaced some tags with 
placeholders {70}. Next, decoding. Finally, restoring tags.
Most placeholders {70} maintained the same in the process of decoding, like 
this:

English: buy { 70 } and enjoy unlimited Trainings sessions .
French:  acheter { 70 } et amusez-vous illimitée formations sessions .

However, some placeholders are incomplete, like this( missed { ):

English: acheter { 70 } et amusez-vous illimitée formations sessions .
French:  illimitée des réunions , chaque avec jusqu' à 70 } les participants

I guess I should use other placeholders. But what placeholders can be options? 
Thanks for any suggestion.

Best regards,
Henry



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders missed

2012-07-02 Thread Barry Haddow
Hi Henry

Either use Moses xml-input feature
http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5

or choose a place holder that does not appear in your phrase table,

cheers - Barry

On Monday 02 July 2012 10:40:17 Henry Hu wrote:
 Hi guys,
 
 I'm attempting to translate English to French. First I replaced some
 tags with placeholders {70}. Next, decoding. Finally, restoring tags.
 Most placeholders {70} maintained the same in the process of decoding,
 like this:
 
 English: buy { 70 } and enjoy unlimited Trainings sessions .
 French:  acheter { 70 } et amusez-vous illimitée formations sessions .
 
 However, some placeholders are incomplete, like this( missed { ):
 
 English: acheter { 70 } et amusez-vous illimitée formations sessions .
 French:  illimitée des réunions , chaque avec jusqu' à 70 } les
  participants
 
 I guess I should use other placeholders. But what placeholders can be
 options? Thanks for any suggestion.
 
 Best regards,
 Henry
 
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support
 
 
--
Barry Haddow
University of Edinburgh
+44 (0) 131 651 3173

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders missed

2012-07-02 Thread Tom Hoar
 Henry, you might also change your tokenization so your placeholders 
 remain one token.


 On Mon, 2 Jul 2012 10:50:11 +0100, Barry Haddow 
 bhad...@staffmail.ed.ac.uk wrote:
 Hi Henry

 Either use Moses xml-input feature
 http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc5

 or choose a place holder that does not appear in your phrase table,

 cheers - Barry

 On Monday 02 July 2012 10:40:17 Henry Hu wrote:
 Hi guys,

 I'm attempting to translate English to French. First I replaced some
 tags with placeholders {70}. Next, decoding. Finally, restoring 
 tags.
 Most placeholders {70} maintained the same in the process of 
 decoding,
 like this:

 English: buy { 70 } and enjoy unlimited Trainings sessions .
 French:  acheter { 70 } et amusez-vous illimitée formations sessions 
 .

 However, some placeholders are incomplete, like this( missed { ):

 English: acheter { 70 } et amusez-vous illimitée formations sessions 
 .
 French:  illimitée des réunions , chaque avec jusqu' à 70 } les
  participants

 I guess I should use other placeholders. But what placeholders can 
 be
 options? Thanks for any suggestion.

 Best regards,
 Henry

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


 --
 Barry Haddow
 University of Edinburgh
 +44 (0) 131 651 3173

 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.


 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Placeholders missed

2012-07-02 Thread Daniel Schaut
Hi Henry,

can also try to exclude the placeholders from the tokenization process so
that your example would look like this:

buy {70} and enjoy unlimited Trainings sessions

This worked pretty well for me. This means however that you might need to
train new models.

Best,
Daniel

-Ursprüngliche Nachricht-
Von: moses-support-boun...@mit.edu [mailto:moses-support-boun...@mit.edu] Im
Auftrag von Henry Hu
Gesendet: 02 July 2012 11:40
An: moses-support@mit.edu
Betreff: [Moses-support] Placeholders missed

Hi guys,

I'm attempting to translate English to French. First I replaced some tags
with placeholders {70}. Next, decoding. Finally, restoring tags.
Most placeholders {70} maintained the same in the process of decoding, like
this:

English: buy { 70 } and enjoy unlimited Trainings sessions .
French:  acheter { 70 } et amusez-vous illimitée formations sessions .

However, some placeholders are incomplete, like this( missed { ):

English: acheter { 70 } et amusez-vous illimitée formations sessions .
French:  illimitée des réunions , chaque avec jusqu' à 70 } les participants

I guess I should use other placeholders. But what placeholders can be
options? Thanks for any suggestion.

Best regards,
Henry

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support