Bug#1064343: tput sgr0 adds uncalled-for codes

Adam Borowski Tue, 20 Feb 2024 22:48:14 -0800

On Tue, Feb 20, 2024 at 07:41:42PM +0100, Sven Joachim wrote:
> > On Tue, Feb 20, 2024 at 04:15:30PM +0800, Paul Wise wrote:
> >>    $ tput sgr0 | hd
> >>    00000000  1b 28 42 1b 5b 6d                                 |.(B.[m|
> >
> > Here's the culprit.  The code you asked for is "\e[0m" -- shortenable to
> > non-canonical but valid "\e[m", which is the second half of tput's output.
> >
> > What you did not ask for, is "\e(B", which is not allowed in UTF-8 mode,
> > and in non-Unicode world would switch to an ancient "US" charset.
> 
> Maybe that is true for the Linux console, but we are talking about xterm
> here.


It's not a property of the terminal, but of ECMA-35.

And what "xterm" are you talking about?  tput has no way to know the
terminal on the other side, as the string TERM=xterm (and
TERM=xterm-256color) applies to over a hundred different terminals using
tens of different code bases.  And you can't even blame their authors, as:
 * most Unices stopped maintaining their terminfo databases (eg. Solaris
   still hasn't learned about TERM=linux.  Solaris is no longer relevant now
   but was relevant for most of that time frame.)
 * even if the databases were maintained, a new terminal would become useful
   only several years after it gets released (as the terminfo entry would
   need to be deployed on every box you might possibly ssh into, with
   failure mode being complete breakage of any terminfo-using program)
Thus, putting aside historic terminals, there are only three TERM values:
 * linux
 * rxvt (used by its derivatives like aterm)
 * xterm (everything else)
(Skipping decorations like -256color which most programs hard-code anyway. 
I thus had to implement 256 color fallbacks in the kernel in 2016; it seems
that eg. 24-bit color is moving the same way.)

> > Putting aside arguments if this code is allowed or not (eg. the author of
> > Putty has strong feelings on the matter), it's very clearly not what you
> > asked for, thus the real bug is on tput's side.

> > Thus:
> > "tput sgr0" should produce sgr0, not setusg0 sgr0.
> 
> It does of course produce sgr0, i.e. it emits whatever escape sequence
> $TERM's terminfo entry declares as sgr0.  In the case of xterm-256color,
> sgr0=\E(B\E[m.

And it's that entry what's wrong.  sgr0 means "\e[0m" (or "\e[m"); see
eg. docs for real xterm: https://www.xfree86.org/current/ctlseqs.html

> The reason for including \E(B here is that sgr0 should cancel the
> effects of a previous smacs (start alternate character set) sequence and
> thus includes the rmacs (end alternate character set) escape sequence.

Then it combines two completely different concepts in one label.  SGR is
for character attributes, G0/G1 are for encoding.

> People are relying on this behavior, see #595484 for instance.

Seems like an XKCD 1172 case.

> Closing the bug, because everything works as intended.

...

Well, I'm not going to fight a BTS war, but I don't agree with your
decision.

I'll work around this misbehaviour (as it's no extra work for me: I need
to handle legitimately occuring G0/G1 changes anyway).  Still, it is a bug
even if its severity is negligible: thanks to PuTTY's author's stubborness
no maintained software uses G0/G1 anymore.

Thus, the only real fallout is bloating terminal output.  It's still too
slow on serial links or inferior terminals (I felt bad about Scaleway's
web console just hours ago); saving three bytes per sgr0 is not much but
it is a very frequently used sequence.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁ Vat kind uf sufficiently advanced technology iz dis!?
⢿⡄⠘⠷⠚⠋⠀                                 -- Genghis Ht'rok'din
⠈⠳⣄⠀⠀⠀⠀

Bug#1064343: tput sgr0 adds uncalled-for codes

Reply via email to