Hi Alex,

At 2026-02-22T12:47:58+0100, Alejandro Colomar wrote:
> On 2026-02-21T21:01:36-0600, G. Branden Robinson wrote:
> > At 2026-02-21T23:52:08+0100, Alejandro Colomar wrote:
> > > I'd like to know why you did that.
> > > 
> > >   commit 4b7a0fe5ab5d5155bd499cf9506a91a1f4bc0125
> > >   Author: G. Branden Robinson <[email protected]>
> > >   Date:   2025-12-06 18:26:02 -0600
> > > 
> > >       src/utils/xtotroff/xtotroff.c: Fix code style nit.
> > >       
> > >       * src/utils/xtotroff/xtotroff.c:
> > >         (CanonicalizeFontName, FontNamesAmbiguous, MapFont, main):
> > >         Explicitly cast unused return values of printf(3)-family
> > >         functions to `void`.
> > > 
> > > I've been updating code to remove those casts, because they don't
> > > do much good.  It's essentially just noise.
> > 
> > I disagree.  I feel that one should never call a function with no
> > evident awareness of its return type.  (If its return type is
> > "void", then no cast should be made!)  I acknowledge that
> > type-slovenliness is a proud tradition in C, but I reject it.  It
> > only makes life harder for medium- to large-scale projects.
> 
> Isn't "I don't read the return value" an proof of evident awareness
> that either a) a function has no return value or b) it has a value we
> don't care about?

I'm not convinced of that.

> Why would it be important to differentiate those two?  And even more
> importantly, when a mistake is made, how do we differentiate a) the
> programmer know more than us and knew it could discard the value vs b)
> the programmer was clueless and dumbly dumped the value?

As a general rule these two cases are hard to distinguish, especially in
"terse" languages like C that historically were designed to meet the
needs of a small pool of closely collaborating practitioners of high
average competence who were often also experts in production of assembly
language programs for the only machine that the language then targeted,
the DEC PDP-11.

Much about Pascal has been derided by C partisans because Wirth
explicitly designed Pascal as a teaching language.  In my opinion (and
others', famously Kernighan's), he punted too many language features,[1]
but his attention to the needs of language learners was, I maintain, not
ill-conceived in the least.

C advocates trumpeted loudly that theirs was a language for people who
know what they were doing; many of these same advocates went on to prove
that they themselves didn't.  The ill-kept secret is that _everybody_
has moments where they don't know what they're doing.  Much more
mischief arises from attempts to conceal this fact than from designing
systems around the expectation of it.  This is why lint(1) got written
even before C became portable.  Or around the same time, anyway.[3]

> > Languages like Pascal and Ada distinguish "procedures" (which affect
> > the state of the system only via "side effects") from "functions"
> > that might be "pure", but even if not, generally communicate
> > information back to their callers via return values.
> 
> We have something close in C: __attribute__((__unsequenced__)).

Not looking it up...but reasoning from my crude understanding of C's
definition of a "sequence point", which involves the semantics of memory
ordering, in which there are (only?) two major approaches, total-store
ordering (TSO) and acquire-release semantics, I think I can see how one
can apply "unsequencedness" to a "pure function".

It's my understanding that, unless something's changed in the past 5
years or so, x86 processors have always implemented TSO, and so many
programmers, including Linux kernel hackers, have tended to assume those
are the only memory sequencing semantics that exist.  As I understand it
(probably poorly), TSO is great for satisfying programmer expectations
but causes many cache flushes/invalidations that are, strictly,
unnecessary given the data being manipulated, and this is bad for
performance, and because major decisions in technology firms are made on
the sophisticated basis of "number go up!",[2][4] processor vendors have
pressured their engineers to cut corners in cache management to recover
performance lost by attention to correctness.  Thus Spectre and Meltdown
and Rowhammer and their many progeny.

Acquire-release is trickier to understand (at least for me, or maybe I
was foolish to attempt to learn the concept from the RISC-V processor
manual[5]) and furthermore might demand exposure in programming
languages in ways that will require a lot of adaptation on the part of
practitioners.  There's some resemblance, I think, to C's "register"
storage type, which started out being very important, then became a
relic ignored by compilers and deprecated by language instructors, and
now may be picking up fresh new semantics (like the also-long-ignored
"auto") to attack the acquire-release semantic annotation problem
without introducing a new keyword, which is thought always to flip over
the table in C.  (A thought that's somewhat warranted because, yup, no
name spaces.)

It might not be a terrible fit; a "register" variable, literally
interpreted, is a datum that does not leave a processor once initially
loaded from memory (if necessary, which it often isn't, as initializing
a loop index can often use a machine instruction with an "immediate"
addressing mode, and frequently this immediate value is zero), meaning
it shouldn't land in any caches.  My intuition tells me that you might
be able to get to acquire-release semantics from there.  (I'm sure
there's more to it than that.  Rather than simply proclaiming that
load/stores with acquire-release semantics "shall not touch cache [let
alone 'real' memory]"--which is impractical because interrupts happen--I
think it's more correct to say that they mean that _if_ a load/store
from/to a memory location disrupts "acquisition" of the same location
prior to "release", then the acquire-release load/store has to be
re-attempted.  This is probably something that would be handled by one's
language runtime support, but C famously prefers to have almost no
runtime support at all.  "Close to the metal!")

If you have some references that would help me understand this stuff
better, pitch 'em at me!

> It has issues, though, as it's too easy to misuse that attribute,
> since the compiler will blindly trust anything you say.

This seems to be a recurring problem with attributes and, before them,
type qualifiers in C.  See also "const" and "restrict".  Not too many
days ago I saw reference to an old screed against "volatile" by Linus
Torvalds.

For a language written primarily by people who know what they're doing,
there sure do seem to be a lot of people who don't know what they're
doing.  :-|

> [...]
> > No proud C hacker ever lets correctness get in the way of
> > performance.)
> 
> I am a proud C hacker, and I am proud to put correctness before
> performance.  :)

And I applaud you for that.  I predict you'll get into a lot of fights
with others who don't, but won't admit it.  "Number go up!"

Maybe you can convince me to start saying "few" instead.  :D

> > Of course historically, C functions returned ints by default even
> > without declaration because C is descended from the "typeless"
> > B.[...] And if nobody could think of anything _good_ to stick into
> > the return value, well then they'd come up with something crappy and
> > return that.[...]  This convention persisted even for functions that
> > returned pointers.  Don't think too hard about what the best datum
> > to return is, just return _something_--like one of the arguments the
> > caller already knows because they passed it in.
> > 
> > https://www.symas.com/post/the-sad-state-of-c-strings
> > https://dgtalhaven.wordpress.com/2020/05/15/schlemiel-the-painters-algorithm/
> 
> In some cases, returning the input pointer is actually useful.  I use
> the return value of memcpy(3) and strcpy(3) to construct interesting
> one-liner macros.  Without that return value, it would be impossible.
> 
>       #define strndupa(s, n)  strncat(strcpy(alloca(n + 1), ""), s, n)
> 
> You might complain that this is inefficient (and indeed, you have a
> link to Schlemiel the Painer above).  However, I'm proud to prefer
> correctness over performance here.

Heh.  Fair point.  Let me pivot then.  I think the symmetry between
fprintf and printf on the one hand (the latter is really just a
specialization of the former to `FILE *stdio`) and sprintf on the other
has proven to be deceptive and a trap, not just for the unwary C
programmer, but for nearly all of us.

To see my point requires only one question to be asked and considered.

Why don't we have nprintf() and fnprintf()?

The answer, which the rock star brogrammer starts with a loud sigh or
"DUH!"--if you get a real answer at all--is that interactions with
memory and device output differ.  (A C stdio `FILE` may be "buffered" in
memory, but you can't amend its serialized operations or, within the
programming language, reliably stop dispatched write operations on a
file from hitting whatever non-volatile storage backs it.)

Writing to network sockets or disk files or devices is pretty much a
fire-and-forget operation.  Once you've done it, it's out of your hands.
You can't "overrun" a "buffer".  Any event like that is handled via
other means than the return value of the function performing the write.

Writing to memory buffers as such is fundamentally different from I/O.

> > My explicit discards of the return value remind the reader (often
> > myself) that, yes, I'm aware that printf() has a return value, and
> > that I don't need it.
> 
> Are you sure?  I've seen snprintf(3) calls where you've discarded the
> return value.  Should we conclude that you don't care about
> truncation?  That's usually a quite bad bug.  What if some you thought
> the buffer size would be enough but your calculation was wrong?

Me personally?  Or the groff code base in general?  I'm a very long way
from replacing all of groff's C/C++ code with my own work.  That's not
even a goal I have.  I select a task, try to constrain its scope, and
try to confine my activities to resolution of that task.  Scope does
frequently creep, but I try to (a) stage commits and (b) file Savannah
tickets to log general clean-ups that I think should occur.

I admit that I don't maintain perfect discipline here, but I try.

> Should we conclude that you don't care about truncation?  That's
> usually a quite bad bug.  What if some you thought the buffer size
> would be enough but your calculation was wrong?  If you had added
> error handling, you'd catch the bug pretty quickly.

I agree.  There is generally less error handling in groff than I would
prefer.  One area I've devoted a lot of attention to is input validation
in the formatter.  These additions have occasionally provoked unhappy
responses from users who did not welcome their slop being pointed out to
them.  "What do you MEAN my terminal has no 'C' font?!"  (Most, however,
seem to adapt.)

> Alternatively, if you had used sprintf(3) and _FORTIFY_SOURCE, you'd
> also catch the bug pretty quickly.

I frequently build groff with `-D_FORTIFY_SOURCE=2`, following a
suggestion from Bjarni Ingi Gislason.  Frequently--and always before
pushing.

> However, silent truncation will result in the bug surviving ages.

True, and if you know of any cases of string buffer truncation in the
groff code base, I'd appreciate your telling this development community
about them.

> > "But what if everybody did that?  You'd clutter the world with void
> > typecasts!"
> > 
> > Yes, if I persist in using crappy APIs.
> 
> Are you saying snprintf(3) is a crappy API (and I do say that, FWIW,
> but for other reasons)?  Returning a value is quite necessary.

See above regarding misleading similarity between fprintf and sprintf.

It was `printf()` that you raised in your original post.  ;-)

> FWIW, most C programs are still single-threaded today.  Let the shell
> combine them.

I really like a new Ada 2022 feature.

http://ada-auth.org/standards/22over/html/Ov22-2-1.html

If I understand correctly, Doug McIlroy has more than once pointed out
how GNU troff could make use of such a mechanism at the language level
to speedily make Knuth-Plass style paragraph formatting decisions.

But we don't have it in C or C++.

It would be massively tedious to shove out to the shell, and possibly
ruinous of any performance advantage that would be otherwise gained.

> > The standard C library was, and is, deserving of sterner scrutiny
> > than it gets--and now that I mention it, gets() was far from the
> > only grievous wart it has carried.  We have paid in confusion,
> > wasted time, and unclear practices, and will continue to do so,
> > unless we slaughter sacred cows and reconsider popular idioms from
> > first principles.
> 
> I'm working on that.  :)

Full speed ahead!  :D

Regards,
Branden

P.S.  Aha!  While composing this mail I stumbled across a post-mortem
      (my characterization) on Pascal, dated 1993, by Wirth himself!

      I'll be reading this closely.

      
http://pascal.hansotten.com/uploads/wirth/Recollection%20On%20Dev%20of%20Pascal.pdf

[1] The only specific _language feature_ absent from Pascal that I
    recall Kernighan complaining about is what was later standardized in
    ISO 7185 as "conformant arrays", which even then were the (only)
    "optional" feature of the standard language.  It seems to me like
    such a shockingly obvious improvement, with no downside even to
    novice student programmers, that I have trouble imagining who was
    opposed to it.  Wirth himself?  Pascal vendors?  I wonder if Turbo
    Pascal 1.0 had them.

    https://stackoverflow.com/questions/8482318/what-is-a-conformant-array

    Kernighan did make a generalized complaint about Pascal's lack of
    escape hatches--yet we do not find "asm" documented in any C
    language standard.

[2] https://en.wikipedia.org/wiki/Number_Go_Up

[3] Or around the same time.  It can be tricky to date stuff from the
    mid-1970s precisely.  lint(1) first appears in Seventh Edition Unix
    (1979), which places it after Sixth Edition (1975), but leaves a
    generous four years of slosh.  The "first port" of C to a platform
    other than the PDP-11 is reputedly that to the Interdata 7/32, which
    was undertaken through cooperation with the Bell Labs CSRC.  For
    some reason this port was, by 1978 (see below), occluded by another
    port (by the same team?) to the related Interdata 8/32 machine.

    
http://bitsavers.informatik.uni-stuttgart.de/bits/Interdata/32bit/unix/univWollongong_v6/miller.pdf

    But apparently there was _another_ port (or reimplementation?) of
    the Ritchie C compiler for the IBM 360 (a machine still famous due
    to Fred Brooks's _The Mythical Man-Month_ recounting its troubled
    development).  I can't find good sources for this claim; I think I
    read about it on the TUHS list.  K&R, even in the first edition of
    _The C Programming Language_ (1978), mention ports to the IBM 370
    and the Honeywell 6000 as well as the Interdata 8/32, but attach no
    chronology to these efforts.  (Understandably--as ever in software
    engineering, it can be hard to define when a complex project is
    "done".  I've never heard of a conformance test suite for a C
    compiler and runtime--as minimal as the latter would be--existing as
    far back as the 1970s.  An expert might know better.)

[4] Good general discussion here.  Many varying, sometimes conflicting,
    opinions.  Take what you want and leave what you don't.  :)

    https://crookedtimber.org/2008/04/30/is-there-a-general-skill-of-management/

[5] 
https://lists.riscv.org/g/sig-documentation/attachment/499/0/riscv-unprivileged.pdf

Attachment: signature.asc
Description: PGP signature

Reply via email to