Updates to tokenizer.perl and detokenizer.perl include escaping
characters that moses reserve for other use, such as table delimiters
and brackets. Is there a reason the updates are missing the apostrophe
and quite, which are two of the five XML reserved characters? Not
escaping them could affect communications with mosesserver. 

I propose
adding these two escape sequences to the tokenizer and detokenizer
scripts. 

tokenizer.perl, line 151
 #escape special chars
 $text =~
s/&/&/g; # XML
 $text =~ s/|//g; # moses
 $text =~ s//g; # XML
 $text =~
s/[//g; # moses
 $text =~ s/]//g; # moses
 $text =~ s/'//g; # XML
 $text
=~ s/"/"/g; # XML

detokenizer.perl, line 67
 # de-escape special chars

$text =~ s//'/g; # XML
 $text =~ s/"/"/g; # XML
 $text =~ s//|/g; #
moses
 $text =~ s//g; # XML
 $text =~ s//[/g; # moses
 $text =~ s//]/g;
# moses
 $text =~ s/&/&/g; # XML

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to