Hi Alex, At 2026-02-22T12:47:58+0100, Alejandro Colomar wrote: > On 2026-02-21T21:01:36-0600, G. Branden Robinson wrote: > > At 2026-02-21T23:52:08+0100, Alejandro Colomar wrote: > > > I'd like to know why you did that. > > > > > > commit 4b7a0fe5ab5d5155bd499cf9506a91a1f4bc0125 > > > Author: G. Branden Robinson <[email protected]> > > > Date: 2025-12-06 18:26:02 -0600 > > > > > > src/utils/xtotroff/xtotroff.c: Fix code style nit. > > > > > > * src/utils/xtotroff/xtotroff.c: > > > (CanonicalizeFontName, FontNamesAmbiguous, MapFont, main): > > > Explicitly cast unused return values of printf(3)-family > > > functions to `void`. > > > > > > I've been updating code to remove those casts, because they don't > > > do much good. It's essentially just noise. > > > > I disagree. I feel that one should never call a function with no > > evident awareness of its return type. (If its return type is > > "void", then no cast should be made!) I acknowledge that > > type-slovenliness is a proud tradition in C, but I reject it. It > > only makes life harder for medium- to large-scale projects. > > Isn't "I don't read the return value" an proof of evident awareness > that either a) a function has no return value or b) it has a value we > don't care about?
I'm not convinced of that. > Why would it be important to differentiate those two? And even more > importantly, when a mistake is made, how do we differentiate a) the > programmer know more than us and knew it could discard the value vs b) > the programmer was clueless and dumbly dumped the value? As a general rule these two cases are hard to distinguish, especially in "terse" languages like C that historically were designed to meet the needs of a small pool of closely collaborating practitioners of high average competence who were often also experts in production of assembly language programs for the only machine that the language then targeted, the DEC PDP-11. Much about Pascal has been derided by C partisans because Wirth explicitly designed Pascal as a teaching language. In my opinion (and others', famously Kernighan's), he punted too many language features,[1] but his attention to the needs of language learners was, I maintain, not ill-conceived in the least. C advocates trumpeted loudly that theirs was a language for people who know what they were doing; many of these same advocates went on to prove that they themselves didn't. The ill-kept secret is that _everybody_ has moments where they don't know what they're doing. Much more mischief arises from attempts to conceal this fact than from designing systems around the expectation of it. This is why lint(1) got written even before C became portable. Or around the same time, anyway.[3] > > Languages like Pascal and Ada distinguish "procedures" (which affect > > the state of the system only via "side effects") from "functions" > > that might be "pure", but even if not, generally communicate > > information back to their callers via return values. > > We have something close in C: __attribute__((__unsequenced__)). Not looking it up...but reasoning from my crude understanding of C's definition of a "sequence point", which involves the semantics of memory ordering, in which there are (only?) two major approaches, total-store ordering (TSO) and acquire-release semantics, I think I can see how one can apply "unsequencedness" to a "pure function". It's my understanding that, unless something's changed in the past 5 years or so, x86 processors have always implemented TSO, and so many programmers, including Linux kernel hackers, have tended to assume those are the only memory sequencing semantics that exist. As I understand it (probably poorly), TSO is great for satisfying programmer expectations but causes many cache flushes/invalidations that are, strictly, unnecessary given the data being manipulated, and this is bad for performance, and because major decisions in technology firms are made on the sophisticated basis of "number go up!",[2][4] processor vendors have pressured their engineers to cut corners in cache management to recover performance lost by attention to correctness. Thus Spectre and Meltdown and Rowhammer and their many progeny. Acquire-release is trickier to understand (at least for me, or maybe I was foolish to attempt to learn the concept from the RISC-V processor manual[5]) and furthermore might demand exposure in programming languages in ways that will require a lot of adaptation on the part of practitioners. There's some resemblance, I think, to C's "register" storage type, which started out being very important, then became a relic ignored by compilers and deprecated by language instructors, and now may be picking up fresh new semantics (like the also-long-ignored "auto") to attack the acquire-release semantic annotation problem without introducing a new keyword, which is thought always to flip over the table in C. (A thought that's somewhat warranted because, yup, no name spaces.) It might not be a terrible fit; a "register" variable, literally interpreted, is a datum that does not leave a processor once initially loaded from memory (if necessary, which it often isn't, as initializing a loop index can often use a machine instruction with an "immediate" addressing mode, and frequently this immediate value is zero), meaning it shouldn't land in any caches. My intuition tells me that you might be able to get to acquire-release semantics from there. (I'm sure there's more to it than that. Rather than simply proclaiming that load/stores with acquire-release semantics "shall not touch cache [let alone 'real' memory]"--which is impractical because interrupts happen--I think it's more correct to say that they mean that _if_ a load/store from/to a memory location disrupts "acquisition" of the same location prior to "release", then the acquire-release load/store has to be re-attempted. This is probably something that would be handled by one's language runtime support, but C famously prefers to have almost no runtime support at all. "Close to the metal!") If you have some references that would help me understand this stuff better, pitch 'em at me! > It has issues, though, as it's too easy to misuse that attribute, > since the compiler will blindly trust anything you say. This seems to be a recurring problem with attributes and, before them, type qualifiers in C. See also "const" and "restrict". Not too many days ago I saw reference to an old screed against "volatile" by Linus Torvalds. For a language written primarily by people who know what they're doing, there sure do seem to be a lot of people who don't know what they're doing. :-| > [...] > > No proud C hacker ever lets correctness get in the way of > > performance.) > > I am a proud C hacker, and I am proud to put correctness before > performance. :) And I applaud you for that. I predict you'll get into a lot of fights with others who don't, but won't admit it. "Number go up!" Maybe you can convince me to start saying "few" instead. :D > > Of course historically, C functions returned ints by default even > > without declaration because C is descended from the "typeless" > > B.[...] And if nobody could think of anything _good_ to stick into > > the return value, well then they'd come up with something crappy and > > return that.[...] This convention persisted even for functions that > > returned pointers. Don't think too hard about what the best datum > > to return is, just return _something_--like one of the arguments the > > caller already knows because they passed it in. > > > > https://www.symas.com/post/the-sad-state-of-c-strings > > https://dgtalhaven.wordpress.com/2020/05/15/schlemiel-the-painters-algorithm/ > > In some cases, returning the input pointer is actually useful. I use > the return value of memcpy(3) and strcpy(3) to construct interesting > one-liner macros. Without that return value, it would be impossible. > > #define strndupa(s, n) strncat(strcpy(alloca(n + 1), ""), s, n) > > You might complain that this is inefficient (and indeed, you have a > link to Schlemiel the Painer above). However, I'm proud to prefer > correctness over performance here. Heh. Fair point. Let me pivot then. I think the symmetry between fprintf and printf on the one hand (the latter is really just a specialization of the former to `FILE *stdio`) and sprintf on the other has proven to be deceptive and a trap, not just for the unwary C programmer, but for nearly all of us. To see my point requires only one question to be asked and considered. Why don't we have nprintf() and fnprintf()? The answer, which the rock star brogrammer starts with a loud sigh or "DUH!"--if you get a real answer at all--is that interactions with memory and device output differ. (A C stdio `FILE` may be "buffered" in memory, but you can't amend its serialized operations or, within the programming language, reliably stop dispatched write operations on a file from hitting whatever non-volatile storage backs it.) Writing to network sockets or disk files or devices is pretty much a fire-and-forget operation. Once you've done it, it's out of your hands. You can't "overrun" a "buffer". Any event like that is handled via other means than the return value of the function performing the write. Writing to memory buffers as such is fundamentally different from I/O. > > My explicit discards of the return value remind the reader (often > > myself) that, yes, I'm aware that printf() has a return value, and > > that I don't need it. > > Are you sure? I've seen snprintf(3) calls where you've discarded the > return value. Should we conclude that you don't care about > truncation? That's usually a quite bad bug. What if some you thought > the buffer size would be enough but your calculation was wrong? Me personally? Or the groff code base in general? I'm a very long way from replacing all of groff's C/C++ code with my own work. That's not even a goal I have. I select a task, try to constrain its scope, and try to confine my activities to resolution of that task. Scope does frequently creep, but I try to (a) stage commits and (b) file Savannah tickets to log general clean-ups that I think should occur. I admit that I don't maintain perfect discipline here, but I try. > Should we conclude that you don't care about truncation? That's > usually a quite bad bug. What if some you thought the buffer size > would be enough but your calculation was wrong? If you had added > error handling, you'd catch the bug pretty quickly. I agree. There is generally less error handling in groff than I would prefer. One area I've devoted a lot of attention to is input validation in the formatter. These additions have occasionally provoked unhappy responses from users who did not welcome their slop being pointed out to them. "What do you MEAN my terminal has no 'C' font?!" (Most, however, seem to adapt.) > Alternatively, if you had used sprintf(3) and _FORTIFY_SOURCE, you'd > also catch the bug pretty quickly. I frequently build groff with `-D_FORTIFY_SOURCE=2`, following a suggestion from Bjarni Ingi Gislason. Frequently--and always before pushing. > However, silent truncation will result in the bug surviving ages. True, and if you know of any cases of string buffer truncation in the groff code base, I'd appreciate your telling this development community about them. > > "But what if everybody did that? You'd clutter the world with void > > typecasts!" > > > > Yes, if I persist in using crappy APIs. > > Are you saying snprintf(3) is a crappy API (and I do say that, FWIW, > but for other reasons)? Returning a value is quite necessary. See above regarding misleading similarity between fprintf and sprintf. It was `printf()` that you raised in your original post. ;-) > FWIW, most C programs are still single-threaded today. Let the shell > combine them. I really like a new Ada 2022 feature. http://ada-auth.org/standards/22over/html/Ov22-2-1.html If I understand correctly, Doug McIlroy has more than once pointed out how GNU troff could make use of such a mechanism at the language level to speedily make Knuth-Plass style paragraph formatting decisions. But we don't have it in C or C++. It would be massively tedious to shove out to the shell, and possibly ruinous of any performance advantage that would be otherwise gained. > > The standard C library was, and is, deserving of sterner scrutiny > > than it gets--and now that I mention it, gets() was far from the > > only grievous wart it has carried. We have paid in confusion, > > wasted time, and unclear practices, and will continue to do so, > > unless we slaughter sacred cows and reconsider popular idioms from > > first principles. > > I'm working on that. :) Full speed ahead! :D Regards, Branden P.S. Aha! While composing this mail I stumbled across a post-mortem (my characterization) on Pascal, dated 1993, by Wirth himself! I'll be reading this closely. http://pascal.hansotten.com/uploads/wirth/Recollection%20On%20Dev%20of%20Pascal.pdf [1] The only specific _language feature_ absent from Pascal that I recall Kernighan complaining about is what was later standardized in ISO 7185 as "conformant arrays", which even then were the (only) "optional" feature of the standard language. It seems to me like such a shockingly obvious improvement, with no downside even to novice student programmers, that I have trouble imagining who was opposed to it. Wirth himself? Pascal vendors? I wonder if Turbo Pascal 1.0 had them. https://stackoverflow.com/questions/8482318/what-is-a-conformant-array Kernighan did make a generalized complaint about Pascal's lack of escape hatches--yet we do not find "asm" documented in any C language standard. [2] https://en.wikipedia.org/wiki/Number_Go_Up [3] Or around the same time. It can be tricky to date stuff from the mid-1970s precisely. lint(1) first appears in Seventh Edition Unix (1979), which places it after Sixth Edition (1975), but leaves a generous four years of slosh. The "first port" of C to a platform other than the PDP-11 is reputedly that to the Interdata 7/32, which was undertaken through cooperation with the Bell Labs CSRC. For some reason this port was, by 1978 (see below), occluded by another port (by the same team?) to the related Interdata 8/32 machine. http://bitsavers.informatik.uni-stuttgart.de/bits/Interdata/32bit/unix/univWollongong_v6/miller.pdf But apparently there was _another_ port (or reimplementation?) of the Ritchie C compiler for the IBM 360 (a machine still famous due to Fred Brooks's _The Mythical Man-Month_ recounting its troubled development). I can't find good sources for this claim; I think I read about it on the TUHS list. K&R, even in the first edition of _The C Programming Language_ (1978), mention ports to the IBM 370 and the Honeywell 6000 as well as the Interdata 8/32, but attach no chronology to these efforts. (Understandably--as ever in software engineering, it can be hard to define when a complex project is "done". I've never heard of a conformance test suite for a C compiler and runtime--as minimal as the latter would be--existing as far back as the 1970s. An expert might know better.) [4] Good general discussion here. Many varying, sometimes conflicting, opinions. Take what you want and leave what you don't. :) https://crookedtimber.org/2008/04/30/is-there-a-general-skill-of-management/ [5] https://lists.riscv.org/g/sig-documentation/attachment/499/0/riscv-unprivileged.pdf
signature.asc
Description: PGP signature
