Re: printing plural values outside unsigned long int range

Bruno Haible Thu, 17 Aug 2006 05:35:37 -0700

Paul Eggert wrote:
> Here are proposed patches to the gettext manual to address some of
> the problems I ran into with coreutils


Thanks for proposing documentation for this. Indeed these topics were
already discussed on the mailing lists but not yet documented.

> It'd be nice to have an ngettext variant that works for uintmax_t (and
> even intmax_t and/or long double, so long as I'm asking for the moon
> :-).

Now that's asking for too much, since the workaround is so simple.

> +When translating format strings, it is usually better to use the same
> +conversion specifications in @var{msgid1} and @var{msgid2}, as this
> +simplifies the translator's job.  For example:
> +
> [EMAIL PROTECTED]
> +/* Avoid usages like this one.  */
> +printf (ngettext ("one file removed", "%d files removed", n), n);
> [EMAIL PROTECTED] smallexample

I don't see a reason to avoid such usages. It is perfectly valid, and
even good style, to omit the number "1" if you can write it out as "one".

> +Most people use languages for which this solution is adequate.  There
> +may be some trouble with languages which have the notion of a greater
> +plural, which refers to an abnormally large number for the object of
> +discussion, but an idiomatic solution to this problem is beyond the
> +current scope of the design.  For more on the issue of plurals, please
> +see @uref{http://en.wikipedia.org/wiki/Plural, Wikipedia's entry for
> +Plural}.

Thanks for the pointer. Austronesian languages are currently out of scope
for gettext, due to their small computer user number. And about Malagasy
(Malgache): We would need the advice of a native speaker of that language,
telling how much the distinction between "paucal", "greater paucal" and
"plural" is mandatory or whether it's optional. Until then, there's no
point documenting something in the gettext manual.

> +However, there is an implementation problem in invoking functions like
> [EMAIL PROTECTED]: what should the programmer do if the number to be
> +printed is not equal to an @code{unsigned long int} value?  Here are
> +some heuristics to address this problem:
> +
> [EMAIL PROTECTED] @bullet
> [EMAIL PROTECTED]
> +If the number is floating point, or might be negative, use generic
> +wording without @code{ngettext}.

Yes, I agree with that advice.

> One way to do this is to use SI
> +symbols (e.g., @samp{gettext ("length = %g mm")}), as these are not
> +supposed to use plural forms anyway.  Avoid formats like @samp{"%g
> +seconds"}, which might expand to ``1 seconds''.

Whether SI symbols or not ("s" vs. "sec" vs. "seconds"), is not important
here.  What matters more, is whether the user expects a fractional value,
and whether you print some digits after the decimal point.

> [EMAIL PROTECTED]
> +If a nonnegative integer argument might be too large to fit in
> [EMAIL PROTECTED] long int}, reduce it to a value in range with code like
> +this:
> +
> [EMAIL PROTECTED]
> [EMAIL PROTECTED]
> +unsigned long int
> +select_plural (uintmax_t n)
> [EMAIL PROTECTED]
> +  return (n <= ULONG_MAX ? n : n % 1000 + 1000);
> [EMAIL PROTECTED]
> [EMAIL PROTECTED] group
> +
> [EMAIL PROTECTED]
> +void
> +print_file_size (uintmax_t bytes)
> [EMAIL PROTECTED]
> +  printf (ngettext ("%"PRIuMAX" byte", "%"PRIuMAX" bytes",
> +                    select_plural (bytes)),
> +          bytes);
> [EMAIL PROTECTED]
> [EMAIL PROTECTED] group
> [EMAIL PROTECTED] example

I used a similar, smaller example with entire sentences. "%lu bytes" is not
a sentence.

Bruno


*** gettext-cvs/gettext-tools/doc/gettext.texi  Thu Aug 17 03:51:46 2006
--- gettext-6/gettext-tools/doc/gettext.texi    Thu Aug 17 04:28:03 2006
***************
*** 5479,5484 ****
--- 5479,5526 ----
  @end table
  @end table
  
+ You might now ask, @code{ngettext} handles only numbers @var{n} of type
+ @samp{unsigned long}.  What about larger integer types?  What about negative
+ numbers?  What about floating-point numbers?
+ 
+ About larger integer types, such as @samp{uintmax_t} or 
+ @samp{unsigned long long}: they can be handled by reducing the value to a
+ range that fits in an @samp{unsigned long}.  Simply casting the value to
+ @samp{unsigned long} would not do the right thing, since it would treat
+ @code{ULONG_MAX + 1} like zero, @code{ULONG_MAX + 2} like singular, and
+ the like.  Here you can exploit the fact that all mentioned plural form
+ formulas eventually become periodic, with a period that is a divisor of 100
+ (or 1000 or 1000000).  So, when you reduce a large value to another one in
+ the range [1000000, 1999999] that ends in the same 6 decimal digits, you
+ can assume that it will lead to the same plural form selection.  This code
+ does this:
+ 
+ @smallexample
+ #include <inttypes.h>
+ uintmax_t nbytes = ...;
+ printf (ngettext ("The file has %"PRIuMAX" byte.",
+                   "The file has %"PRIuMAX" bytes.",
+                   nbytes > ULONG_MAX ? (nbytes % 1000000) + 1000000 : nbytes),
+         nbytes);
+ @end smallexample
+ 
+ Negative and floating-point values usually represent physical entities for
+ which singular and plural don't clearly apply.  In such cases, there is no
+ need to use @code{ngettext}; a simple @code{gettext} call with a form suitable
+ for all values will do.  For example:
+ 
+ @smallexample
+ printf (gettext ("Time elapsed: %.3f seconds"), num_milliseconds * 0.001);
+ @end smallexample
+ 
+ @noindent
+ Even if @var{num_milliseconds} happens to be a multiple of 1000, the output
+ @smallexample
+ Time elapsed: 1.000 seconds
+ @end smallexample
+ @noindent
+ is acceptable in English, and similarly for other languages.
+ 
  @node Optimized gettext,  , Plural forms, gettext
  @subsection Optimization of the *gettext functions
  @cindex optimization of @code{gettext} functions


_______________________________________________
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils

Re: printing plural values outside unsigned long int range

Reply via email to