This is a separate issue from the parallel "Tokenization problem" thread...
The tokenizer.perl has had one line that transforms the grave accent (`)
to apostrophe and another that transforms double apostrophe ('') to to
single quote. I suspect these have been in the script since the
beginning. However, they recently "bit" me on a recent project. Easy
enough to work around.
Still, I'm wondering. Do they still belong in the tokenizer.perl script?
Or, should they moved into one of the other scripts? The
normalize-punctuation.perl script seems to be a good candidate.
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support