On 25/11/2023 21:27, Sven Köhler wrote:
Not only --from=si is broken. Also --to=si is broken:$ numfmt --to=si 3000 3,0K In order to not break backwards compatibility, you probably have to introduce a switch --lowercase-kilo such that --to=si produces proper SI compliant output. Then have --from=si accept both uppercase and lowercase k. I have to say, that uppercause K is quite common, but it is not correct as far as SI prefixes are concerned. Also note that Ki in iec-i mode is quite correct. I'm torn about iec mode. I believe that people silently switch 1000 for 1024 and use the lower case k as well as uppercase K. Maybe numfmt should have an option to accept/produce both here as well? Is there really a standard/specification that allows k/K for 1024? Wikipedia only lists Ki as IEC prefixes.
I was thinking we only supported uppercase K for compat with output from existing coreutils. But in fact it's quite the opposite. Other coreutils output lowercase k when operating in SI mode. For e.g. this gives an error: $ ls -lh --si /bin/ | numfmt --from=si --field=5 numfmt: invalid suffix in input: ‘54k’ So we should at least accept lowercase k. As for outputting lowercase k for the SI case, the coreutils texinfo has the following in relation to these Kilo prefixes: ‘kB’ kilobyte: 10^3 = 1000. ‘k’ ‘K’ ‘KiB’ kibibyte: 2^{10} = 1024. ‘K’ is special: the SI prefix is ‘k’ and the ISO/IEC 80000-13 prefix is ‘Ki’, but tradition and POSIX use ‘k’ to mean ‘KiB’. So one might be conservative here and keep outputting uppercase K in SI mode. However the above is really in relation to specifying block sizes, to df or dd etc., so we should probably output lower case k for consistency with other coreutils at least. We could be conservative here and have a new --to=Si option (note the casing) to explicitly select/allow variable cased SI prefixes, but I'm not sure that's needed. For IEC mode, we should could just allow uppercase K, but it's simpler and more flexible to accept lowercase k here, without much ambiguity. As for not allowing uppercase K in SI mode, that's probably overkill, and would cause more problems than it would solve. One edge case it would solve is when working with a Kelvin suffix, to avoid the ambiguity in the first case of the following. That's too much of an edge case to worry about I think: $ numfmt --suffix=K --from=si 500K 500K $ numfmt --suffix=K --from=si 500M 500000000K $ numfmt --suffix=K --from=si 500KK 500000K The attached make the adjustment to allow 'k' always, and output 'k' in SI mode. Tests will need adjusting, but no need to clutter the discussion patch with that. cheers, Pádraig.
diff --git a/src/numfmt.c b/src/numfmt.c index a5bdd2f4f..3a684e709 100644 --- a/src/numfmt.c +++ b/src/numfmt.c @@ -228,7 +228,7 @@ default_scale_base (enum scale_type scale) } } -static char const zero_and_valid_suffixes[] = "0KMGTPEZYRQ"; +static char const zero_and_valid_suffixes[] = "0KkMGTPEZYRQ"; static char const *valid_suffixes = 1 + zero_and_valid_suffixes; static inline bool @@ -242,6 +242,7 @@ suffix_power (const char suf) { switch (suf) { + case 'k': /* kilo. */ case 'K': /* kilo or kibi. */ return 1; @@ -811,7 +812,7 @@ double_to_human (long double val, int precision, int prec = user_precision == -1 ? show_decimal_point : user_precision; return snprintf (buf, buf_size, fmt, prec, val, - suffix_power_char (power), + power == 1 && scale == scale_SI ? "k" : suffix_power_char (power), &"i"[! (scale == scale_IEC_I && 0 < power)], suffix ? suffix : ""); } @@ -946,12 +947,13 @@ UNIT options:\n"), stdout); fputs (_("\ auto accept optional single/two letter suffix:\n\ 1K = 1000,\n\ + 1k = 1000,\n\ 1Ki = 1024,\n\ 1M = 1000000,\n\ 1Mi = 1048576,\n"), stdout); fputs (_("\ si accept optional single letter suffix:\n\ - 1K = 1000,\n\ + 1k = 1000,\n\ 1M = 1000000,\n\ ...\n"), stdout); fputs (_("\