I did not make the change, merely worked around the issue (Perl's not my
thing). I think the code is on lines 282-286 and 388-397. Just search
for the `, which appears in a couple other places for loser scrutiny.
On 01/16/2015 04:36 PM, Hieu Hoang wrote:
> it's probably a good idea to make this change. If you've done it
> already, please send me the updated scripts and I'll check it in. If
> not, I'll do it myself
>
> there's hopefully a fast, C++ tokenizer replacement coming soon.
> Highlighting these issues now is useful to understanding exactly how
> the tokenizer works/should work
>
> On 15/01/15 01:52, Tom Hoar wrote:
>> This is a separate issue from the parallel "Tokenization problem"
>> thread...
>>
>> The tokenizer.perl has had one line that transforms the grave accent (`)
>> to apostrophe and another that transforms double apostrophe ('') to to
>> single quote. I suspect these have been in the script since the
>> beginning. However, they recently "bit" me on a recent project. Easy
>> enough to work around.
>>
>> Still, I'm wondering. Do they still belong in the tokenizer.perl script?
>> Or, should they moved into one of the other scripts? The
>> normalize-punctuation.perl script seems to be a good candidate.
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support