The attached patchset addresses a minor issue with program behavior vs. documentation of the df, du, and ls tools from coreutils-8.32, when using the --si option.
It resurrects an issue that was brought up in 2014 [3] and eventually closed in 2018 [4] with a wontfix (after minimal discussion in the intervening time). Summary ------- Output from df, du, ls tools with the --si option display results using single-letter units suffixes "k", "M", "G", etc., rather than "kB", "MB", "GB". The authoritative documentation as to expected behavior with --si is self- contradictory: The program behavior is consistent with the subsections of coreutils.info pertaining to those individual tools, but directly contradicted by the behavior specified in Section 2.3, which specifically concentrates on describing how the various block size options behave. The patchset brings the behavior into accordance with the behavior documented in Section 2.3. Examples: # # Behavior with unmodified coreutils-8.32: # $ df --si /mnt/test Filesystem Size Used Avail Use% Mounted on /dev/sdb5 500M 282M 214M 57% /mnt/test $ du --si /mnt/test/foo 40k /mnt/test/foo $ ls --si -l /mnt/test/foo -rwxr-xr-x 1 root root 40k Sep 8 07:42 /mnt/test/foo # # Behavior with attached patchset applied to coreutils-8.32: # # df --si /mnt/test Filesystem Size Used Avail Use% Mounted on /dev/sdb5 500MB 282MB 214MB 57% /mnt/test $ du --si /mnt/test/foo 40kB /mnt/test/foo $ ls --si -l /mnt/test/foo -rwxr-xr-x 1 root root 40kB Sep 8 07:42 /mnt/test/foo Background and history ---------------------- In what follows, "M" (mega) is used as an example unit of measurement; the same applies to the other suffix units, k, G, etc. "SI option" means the behavior observed using any of the following: Option --si Option --block-size=si Environment variable BLOCKSIZE=si Environment variable BLOCK_SIZE=si Environment variable DF_BLOCK_SIZE=si Doc sections cited below refer to coreutils.info from coreutils-8.32. The main doc vs. behavior discrepancies are as follows: * Section 2.3, which is an overview discussion of the semantics of the various block size options and nomenclature, states unequivocally that when the SI option is specified, the results are expressed using suffix MB, that MB means 1000^2, and that bare M means 1024^2: "With human-readable formats, output sizes are followed by a size letter such as ‘M’ for megabytes. ‘BLOCK_SIZE=human-readable’ uses powers of 1024; ‘M’ stands for 1,048,576 bytes. ‘BLOCK_SIZE=si’ is similar, but uses powers of 1000 and appends ‘B’; ‘MB’ stands for 1,000,000 bytes." * Sections 10.1.2, 14.1, and 14.2 (the subsections pertaining specifically to ls, df, and du) state just the opposite: That the SI option uses bare ("B-less") suffixes, and that the underlying representation base implied by the bare suffixes is decimal: "--si Append an SI-style abbreviation to each size, such as ‘M’ for megabytes. Powers of 1000 are used, not 1024; ‘M’ stands for 1,000,000 bytes." * Subsection 26.2 (which pertains specifically to numfmt) further confuses the issue by giving an example ("e.g. ‘4G’ ↦ ‘4,000,000,000’)", which contradicts Section 2.3 by implying that a bare suffix means decimal base. * The "coreutils gotchas" blurb [2] (which is linked from [1], hence can presumably be considered authoritative) agrees with coreutils.info Section 2.3 in the semantics of M vs. MB, but doesn't specifically say anything about the SI option. There is no dispute (known to me) that the numerical values displayed when using the SI option are indeed based on decimal base, which everyone seems to agree is what is desired for that option. The issue is solely whether the string suffix applied to the numerical values ought to be M or MB. As was pointed out in the original thread [4], the numfmt tool provides a workaround for this issue. But since the issue exists in its own right as an inconsistency between program behavior vs. doc (and between various docs, regardless of which behavior is deemed correct) it seems like addressing it in some form or another ought to be at least considered as an option, despite the numfmt workaround. Effect of patch on build tests ------------------------------ The proposed patch does cause one du build-time test to fail ("test/du/inodes") but this is simply because that test expects the SI option to produce output with a bare suffix rather than with the B-appended suffix as specified by coreutils.info Section 2.3 (which is what the patch hews to). So if the proposed patch is accepted, that test would also have to be updated to agree with the changed expected semantics. Comments -------- This is surely a minor issue; my only motivation for bringing it up again is simply that I just got burned by it, and during the figuring-out-why phase of looking thru the code and doc, it seemed like a reasonably simple patch might be able to take care of it for all three involved programs (df, du, ls) without causing too much side-effect grief, so figured might as well submit it and see if you agree. There may of course be subtleties I've missed that make this simple-seeming fix unworkable. And of course an important consideration for "fixing" output formats of tools that are as widely used as these is how much global breakage would result to the numerous scripts in the wild that scrape output from them. The other side of that is that the proposed patch affects behavior only when the SI option is used, which (I'm guessing) is probably not very often. I totally get that, so am not advocating strongly that it ought to be "fixed" along the lines suggested by the patch, only that it should be re-considered as an option and discussed, rather than wontfix-ing it right off the bat, just because that was how it was previously handled [4]. I suspect that some of the documentation inconsistencies pointed out above were either not present or not appreciated when the issue was wontfixed/closed in 2018. If there is agreement to accept the patchset -- which presently patches only the code behavior, not the doc or the build tests -- let me know, and I'll be glad to propose an updated patchset that attempts to address the associated documentation as well, i.e. brings the various subsections of coreutils.info into a self-consistent state, and modifies the failing du test appropriately. References ---------- [1] "Coreutils - GNU core utilities", top-level coreutils page, https://www.gnu.org/software/coreutils/ [2] "Coreutils gotchas", subsection on "Unit representations", https://www.pixelbeat.org/docs/coreutils-gotchas.html [3] Prior thread from Aug. 2014, same topic: https://lists.gnu.org/archive/html/bug-coreutils/2014-08/msg00022.html [4] Prior thread from Oct. 2018, same topic: https://lists.gnu.org/archive/html/bug-coreutils/2018-10/msg00131.html
*** coreutils-8.32/src/df.c 2020-01-01 07:29:37.000000000 -0700 --- coreutils-8.32-gdg1/src/df.c 2020-09-06 07:15:28.842689578 -0600 *************** *** 1627,1631 **** break; case 'H': ! human_output_opts = human_autoscale | human_SI; output_block_size = 1; break; --- 1627,1632 ---- break; case 'H': ! // GDG: Add 'human_B' ! human_output_opts = human_autoscale | human_SI | human_B; output_block_size = 1; break; *** coreutils-8.32/src/du.c 2020-01-01 07:34:20.000000000 -0700 --- coreutils-8.32-gdg1/src/du.c 2020-09-06 08:07:01.619267684 -0600 *************** *** 798,802 **** case HUMAN_SI_OPTION: ! human_output_opts = human_autoscale | human_SI; output_block_size = 1; break; --- 798,803 ---- case HUMAN_SI_OPTION: ! // GDG: Add human_B ! human_output_opts = human_autoscale | human_SI | human_B; output_block_size = 1; break; *** coreutils-8.32/src/ls.c 2020-03-01 05:30:46.000000000 -0700 --- coreutils-8.32-gdg1/src/ls.c 2020-09-06 09:30:39.371619899 -0600 *************** *** 2267,2272 **** case SI_OPTION: file_human_output_opts = human_output_opts = ! human_autoscale | human_SI; file_output_block_size = output_block_size = 1; break; --- 2267,2273 ---- case SI_OPTION: + // GDG: Add human_B file_human_output_opts = human_output_opts = ! human_autoscale | human_SI | human_B; file_output_block_size = output_block_size = 1; break; *** coreutils-8.32/lib/human.c 2020-01-01 07:14:23.000000000 -0700 --- coreutils-8.32-gdg1/lib/human.c 2020-09-06 07:26:12.465546556 -0600 *************** *** 400,404 **** { human_autoscale + human_SI + human_base_1024, ! human_autoscale + human_SI }; --- 400,405 ---- { human_autoscale + human_SI + human_base_1024, ! // GDG: Add human_B ! human_autoscale + human_SI + human_B };