bug#47103: numfmt: invalid suffix 'k'

Pádraig Brady Sun, 26 Nov 2023 06:59:28 -0800

On 25/11/2023 21:27, Sven Köhler wrote:

Not only --from=si is broken. Also --to=si is broken:


$ numfmt --to=si 3000
3,0K

In order to not break backwards compatibility, you probably have to
introduce a switch --lowercase-kilo such that --to=si produces proper SI
compliant output. Then have --from=si accept both uppercase and lowercase k.

I have to say, that uppercause K is quite common, but it is not correct
as far as SI prefixes are concerned.

Also note that Ki in iec-i mode is quite correct. I'm torn about iec
mode. I believe that people silently switch 1000 for 1024 and use the
lower case k as well as uppercase K. Maybe numfmt should have an option
to accept/produce both here as well?

Is there really a standard/specification that allows k/K for 1024?
Wikipedia only lists Ki as IEC prefixes.


I was thinking we only supported uppercase K for compat with output from 
existing coreutils.
But in fact it's quite the opposite. Other coreutils output lowercase k
when operating in SI mode. For e.g. this gives an error:

  $ ls -lh --si /bin/ | numfmt --from=si --field=5
  numfmt: invalid suffix in input: ‘54k’

So we should at least accept lowercase k.

As for outputting lowercase k for the SI case, the coreutils texinfo has the 
following
in relation to these Kilo prefixes:

  ‘kB’
       kilobyte: 10^3 = 1000.
  ‘k’
  ‘K’
  ‘KiB’
       kibibyte: 2^{10} = 1024.  ‘K’ is special: the SI prefix is ‘k’ and
       the ISO/IEC 80000-13 prefix is ‘Ki’, but tradition and POSIX use
       ‘k’ to mean ‘KiB’.

So one might be conservative here and keep outputting uppercase K in SI mode.
However the above is really in relation to specifying block sizes, to df or dd 
etc.,
so we should probably output lower case k for consistency with other coreutils 
at least.
We could be conservative here and have a new --to=Si option (note the casing)
to explicitly select/allow variable cased SI prefixes, but I'm not sure that's 
needed.

For IEC mode, we should could just allow uppercase K,
but it's simpler and more flexible to accept lowercase k here, without much 
ambiguity.

As for not allowing uppercase K in SI mode, that's probably overkill,
and would cause more problems than it would solve. One edge case
it would solve is when working with a Kelvin suffix, to avoid
the ambiguity in the first case of the following.
That's too much of an edge case to worry about I think:

  $ numfmt --suffix=K --from=si 500K
  500K
  $ numfmt --suffix=K --from=si 500M
  500000000K
  $ numfmt --suffix=K --from=si 500KK
  500000K


The attached make the adjustment to allow 'k' always,
and output 'k' in SI mode. Tests will need adjusting,
but no need to clutter the discussion patch with that.

cheers,
Pádraig.

diff --git a/src/numfmt.c b/src/numfmt.c
index a5bdd2f4f..3a684e709 100644
--- a/src/numfmt.c
+++ b/src/numfmt.c
@@ -228,7 +228,7 @@ default_scale_base (enum scale_type scale)
     }
 }
 
-static char const zero_and_valid_suffixes[] = "0KMGTPEZYRQ";
+static char const zero_and_valid_suffixes[] = "0KkMGTPEZYRQ";
 static char const *valid_suffixes = 1 + zero_and_valid_suffixes;
 
 static inline bool
@@ -242,6 +242,7 @@ suffix_power (const char suf)
 {
   switch (suf)
     {
+    case 'k':                  /* kilo.  */
     case 'K':                  /* kilo or kibi.  */
       return 1;
 
@@ -811,7 +812,7 @@ double_to_human (long double val, int precision,
   int prec = user_precision == -1 ? show_decimal_point : user_precision;
 
   return snprintf (buf, buf_size, fmt, prec, val,
-                   suffix_power_char (power),
+                   power == 1 && scale == scale_SI ? "k" : suffix_power_char (power),
                    &"i"[! (scale == scale_IEC_I && 0 < power)],
                    suffix ? suffix : "");
 }
@@ -946,12 +947,13 @@ UNIT options:\n"), stdout);
       fputs (_("\
   auto       accept optional single/two letter suffix:\n\
                1K = 1000,\n\
+               1k = 1000,\n\
                1Ki = 1024,\n\
                1M = 1000000,\n\
                1Mi = 1048576,\n"), stdout);
       fputs (_("\
   si         accept optional single letter suffix:\n\
-               1K = 1000,\n\
+               1k = 1000,\n\
                1M = 1000000,\n\
                ...\n"), stdout);
       fputs (_("\

bug#47103: numfmt: invalid suffix 'k'

Reply via email to