Updates to tokenizer.perl and detokenizer.perl include escaping characters that moses reserve for other use, such as table delimiters and brackets. Is there a reason the updates are missing the apostrophe and quite, which are two of the five XML reserved characters? Not escaping them could affect communications with mosesserver.
I propose adding these two escape sequences to the tokenizer and detokenizer scripts. tokenizer.perl, line 151 #escape special chars $text =~ s/&/&/g; # XML $text =~ s/|//g; # moses $text =~ s//g; # XML $text =~ s/[//g; # moses $text =~ s/]//g; # moses $text =~ s/'//g; # XML $text =~ s/"/"/g; # XML detokenizer.perl, line 67 # de-escape special chars $text =~ s//'/g; # XML $text =~ s/"/"/g; # XML $text =~ s//|/g; # moses $text =~ s//g; # XML $text =~ s//[/g; # moses $text =~ s//]/g; # moses $text =~ s/&/&/g; # XML
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
