Re: [Moses-support] Legacy tokenizer.perl functionality.

Hieu Hoang Fri, 16 Jan 2015 03:48:44 -0800

i think it's too difficult to police.

Another idea is to get the script to md5 its own source code, and the 
non-prefix files it uses.


On 16/01/15 11:12, Christian Hardmeier wrote:
> On Jan 16, 2015, at 11:51 AM, Tom Hoar wrote:
>
>> I agree with versioning. Could be added to the command line.
>>
>> Also agree that this proposed change qualifies as a version change.
>>
>> How to you propose managing the issue of output changes due to
>> command-line switches, like -no-escape?
> Very good question. To be consistent, you'd probably have to increment the 
> version number even if the change only applies when you use a certain 
> command-line switch. But not if it doesn't affect the input, and maybe not if 
> you just add a new command-line switch that is off by default. What do you 
> think?
>
>
>
>>
>> On 01/16/2015 05:36 PM, Christian Hardmeier wrote:
>>> I'd like to suggest that there should be a version number in the tokeniser 
>>> that is incremented whenever the output changes, even if the change is 
>>> minor and even if it's just a bugfix. Otherwise when you pull a new version 
>>> of moses you don't know if the output of tokenizer.perl is still compatible 
>>> with your existing models. (Moving functionality from tokenizer.perl to 
>>> normalize-punctuation.perl would count as a change from my point of view. I 
>>> don't always use normalize-punctutation.)
>>>
>>> /Christian
>>>
>>> On Jan 16, 2015, at 10:36 AM, Hieu Hoang wrote:
>>>
>>>> it's probably a good idea to make this change. If you've done it
>>>> already, please send me the updated scripts and I'll check it in. If
>>>> not, I'll do it myself
>>>>
>>>> there's hopefully a fast, C++ tokenizer replacement coming soon.
>>>> Highlighting these issues now is useful to understanding exactly how the
>>>> tokenizer works/should work
>>>>
>>>> On 15/01/15 01:52, Tom Hoar wrote:
>>>>> This is a separate issue from the parallel "Tokenization problem" 
>>>>> thread...
>>>>>
>>>>> The tokenizer.perl has had one line that transforms the grave accent (`)
>>>>> to apostrophe and another that transforms double apostrophe ('') to to
>>>>> single quote. I suspect these have been in the script since the
>>>>> beginning. However, they recently "bit" me on a recent project. Easy
>>>>> enough to work around.
>>>>>
>>>>> Still, I'm wondering. Do they still belong in the tokenizer.perl script?
>>>>> Or, should they moved into one of the other scripts? The
>>>>> normalize-punctuation.perl script seems to be a good candidate.
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> [email protected]
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> [email protected]
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> _______________________________________________
>>> Moses-support mailing list
>>> [email protected]
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Legacy tokenizer.perl functionality.

Reply via email to