it's probably a good idea to make this change. If you've done it
already, please send me the updated scripts and I'll check it in. If
not, I'll do it myself
there's hopefully a fast, C++ tokenizer replacement coming soon.
Highlighting these issues now is useful to understanding exactly how the
tokenizer works/should work
On 15/01/15 01:52, Tom Hoar wrote:
> This is a separate issue from the parallel "Tokenization problem" thread...
>
> The tokenizer.perl has had one line that transforms the grave accent (`)
> to apostrophe and another that transforms double apostrophe ('') to to
> single quote. I suspect these have been in the script since the
> beginning. However, they recently "bit" me on a recent project. Easy
> enough to work around.
>
> Still, I'm wondering. Do they still belong in the tokenizer.perl script?
> Or, should they moved into one of the other scripts? The
> normalize-punctuation.perl script seems to be a good candidate.
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support