2013/6/12 Andriy Rysin <[email protected]>

> I noticed that numbers with fractions like 2,2 are split into '2',
> ',', '2' by word tokenizer. In Ukrainian I need to require difference
> case of the following noun based on whether it's a whole number or
> fractional so I was planning to adjust Ukrainian word tokenizer. But I
> think most European languages use comma for fractional numbers so I
> was wandering if somebody already has a solution or if this better be
> done in common code.
>
>
Hi Andriy,

This and other similar things are done in the Catalan word tokenizer. It is
a bit hackish. To make the code more elegant and more general, we could
perhaps do something like the srx segmentation at the world level.... Just
an idea. I'm not sure if it is reasonable.

Regards,
Jaume
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Languagetool-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/languagetool-devel

Reply via email to