* Ben Wiederhake <benwiederh...@gmx.de>, 2016-01-02, 23:13:
The Russian PO file reads:
Plural-Forms: nplurals=4; plural=(n%10==1 && n%100!=11 ? 0 : n%10>=2
&& n%10<=4 && (n%100<12 || n%100>14) ? 1 : n%10==0 || (n%10>=5 &&
n%10<=9) || (n%100>=11 && n%100<=14)? 2 : 3);
[...]
Even though I don't speak Russian, I can tell that this Plural-Forms
can't possibly be correct. Here 4 plural forms are declared, but the
expression never evaluates to 3.
Since it's just modular arithmetic, one can just parse the formula to
fill out a 10x10 table as a "proof". I did that, and came to the same
result as you do, without even looking at your program. (Originally I
assumed a precedence error / parsing issue / whatever, so I didn't
want to start reading C code ... sorry.)
For the record, here's my interpretation, with parenthesis added:
((n%10==1 && n%100!=11)
? 0
: ((n%10>=2 && n%10<=4 && (n%100<12 || n%100>14))
? 1
: ((n%10==0 || (n%10>=5 && n%10<=9) || (n%100>=11 && n%100<=14))
? 2
: 3)))
This rule can be written in regex as follows (note that there is an
implicit "and not any of the above", although it doesn't make a
difference):
- "[023456789]1" => "Transifex one"
- "[023456789][234]" => "Transifex few"
- "1.|.[056789]" => "Transifex many"
- else => "Transifex other"
Hmm, these "one"-"few"-"many"-"other" reminded me about CLDR. An indeed,
if you look at CLDR's plurals table[0], there's a 4th form applicable to
floating-point numbers. My hypothesis is that this Plural-Forms is a
result of a botched automatic conversion from CLDR data.
[0]
http://www.unicode.org/cldr/charts/latest/supplemental/language_plural_rules.html#ru
Now, it would be cool if i18nspector explained better what is wrong
here. [snip] I hope to implement this in the future.
Sounds awesome! However, I was still able to understand that
*something* about the expression was fishy, but didn't understand that
i18nspector is able to detect issues like this.
i18nspector has a small database of "correct" Plural-Forms for the most
"popular" languages (including Russian). So all it knew was that your
Plural-Forms was different than the rest of the world uses.
(It does have other Plural-Forms correctness checks that don't require
any linguistic data, but they didn't trigger in this case.)
(Doesn't that essentially require a SAT-solver?)
Theoretically, yes, checking that a plural expression never evaluates to
a certain value is NP-hard.
But in practice almost all real-world Plural-Forms are structured
similarly to what we saw in this thread, making them easy to analyse. So
I intend to implement something like this:
1) Try to prove that f(i + 100) == f(i) for all i > 100.
2) If we're able to prove it, then we know that the image of f is equal
to {f(0), f(1), ..., f(199), f(200)}.
3) Otherwise, assume that the Plural-Forms is okay. (Alternatively:
assume that Plural-Forms so unusual that it's almost certainly broken.)
--
Jakub Wilk