* Ben Wiederhake <benwiederh...@gmx.de>, 2016-01-02, 23:13:
The Russian PO file reads:

Plural-Forms: nplurals=4; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : n%10==0 || (n%10>=5 && n%10<=9) || (n%100>=11 && n%100<=14)? 2 : 3);
[...]
Even though I don't speak Russian, I can tell that this Plural-Forms can't possibly be correct. Here 4 plural forms are declared, but the expression never evaluates to 3.

Since it's just modular arithmetic, one can just parse the formula to fill out a 10x10 table as a "proof". I did that, and came to the same result as you do, without even looking at your program. (Originally I assumed a precedence error / parsing issue / whatever, so I didn't want to start reading C code ... sorry.)

For the record, here's my interpretation, with parenthesis added:

((n%10==1 && n%100!=11)
? 0
: ((n%10>=2 && n%10<=4 && (n%100<12 || n%100>14))
   ? 1
   : ((n%10==0 || (n%10>=5 && n%10<=9) || (n%100>=11 && n%100<=14))
      ? 2
      : 3)))

This rule can be written in regex as follows (note that there is an implicit "and not any of the above", although it doesn't make a difference):
- "[023456789]1" => "Transifex one"
- "[023456789][234]" => "Transifex few"
- "1.|.[056789]" => "Transifex many"
- else => "Transifex other"

Hmm, these "one"-"few"-"many"-"other" reminded me about CLDR. An indeed, if you look at CLDR's plurals table[0], there's a 4th form applicable to floating-point numbers. My hypothesis is that this Plural-Forms is a result of a botched automatic conversion from CLDR data.
[0] 
http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#ru

Now, it would be cool if i18nspector explained better what is wrong here. [snip] I hope to implement this in the future.

Sounds awesome! However, I was still able to understand that *something* about the expression was fishy, but didn't understand that i18nspector is able to detect issues like this.

i18nspector has a small database of "correct" Plural-Forms for the most "popular" languages (including Russian). So all it knew was that your Plural-Forms was different than the rest of the world uses.

(It does have other Plural-Forms correctness checks that don't require any linguistic data, but they didn't trigger in this case.)

(Doesn't that essentially require a SAT-solver?)

Theoretically, yes, checking that a plural expression never evaluates to a certain value is NP-hard.

But in practice almost all real-world Plural-Forms are structured similarly to what we saw in this thread, making them easy to analyse. So I intend to implement something like this:

1) Try to prove that f(i + 100) == f(i) for all i > 100.

2) If we're able to prove it, then we know that the image of f is equal to {f(0), f(1), ..., f(199), f(200)}.

3) Otherwise, assume that the Plural-Forms is okay. (Alternatively: assume that Plural-Forms so unusual that it's almost certainly broken.)

--
Jakub Wilk

Reply via email to