Hi,

thanks - I committed them to SVN.

-phi

On Wed, Sep 15, 2010 at 4:59 PM, Achim Ruopp <[email protected]> wrote:
> I created nonbreaking_prefix files for ES, FR and IT based on some publicly
> available abbreviation lists. They are available here:
> http://code.google.com/p/corpus-tools/source/browse/trunk/Lingua-Sentence/sh
> are/
> I would take these with a grain of salt - they need to be reviewed by people
> familiar with the languages. The same location also contains a PT
> nonbreaking_prefix file authored by Hilário Leal Fontes, which I believe is
> accurate.
>
> I also have a script that converts SRX files into nonbreaking_prefix files
> with some manual editing required. Please let me know if you are interested.
>
> Achim
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]]
> On Behalf Of Philipp Koehn
> Sent: Wednesday, September 15, 2010 11:17 AM
> To: Tomas Hudik
> Cc: [email protected]
> Subject: Re: [Moses-support] tokenizer for different languages
>
> Hi,
>
> we only provide the lists for the languages we created.
> We would be happy to include other lists in the distribution,
> if such were made available.
>
> They serve the purpose that periods after, for instance,
> "Mr." are not split off (no periods are split off if the following
> word is lowercase).
>
> You can use the tokenizer for any other language, and
> it may not make much difference, since a phrase-based model
> will happily translated, say, "Mr ." as a phrase.
>
> -phi
>
> On Wed, Sep 15, 2010 at 2:20 PM, Tomas Hudik <[email protected]> wrote:
>> Hi,
>>
>> I’ve got a question on script tokenizer.perl.
>> I’m wondering whether is it possible to get somewhere
>> nonbreaking_prefix.* for various languages. Does exist such a place?
>> Or, how I  can tokenize a text file if I don’t have enough knowledge
>> about the particular language.
>>
>> Thanks, Tomas
>>
>> _______________________________________________
>> Moses-support mailing list
>> [email protected]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>

_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to