Hi Gavin,

At 2026-02-01T11:50:36+0000, Gavin Smith wrote:
> On Sun, Feb 01, 2026 at 05:18:19AM -0600, G. Branden Robinson wrote:
> > I'm happy to explain, but beyond from Egmont's "gist" above, ECMA-48
> > is the controlling authority for the structure of these escape
> > sequences.
> > 
> > http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf
> 
> That link didn't work for me.

For me, neither--now.

> This one does:
> 
> https://ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf

Thanks!  I'll fix this in groff's documentation.

According to the Internet Archive's Wayback Machine, ECMA killed the old
URL between 12 January and 25 February 2021.[1]

I apparently added the URL to the grotty(1) man page on 16 January 2020.
https://cgit.git.savannah.gnu.org/cgit/groff.git/commit/?id=4520668e9ec756ecd6486bc5ce937809e1b4f543

So, let it be known to all: if you need to kill a stable URL anywhere on
the Internet but lack authority to do so, just get me to add it to groff
documentation.

It'll be dead in a year.

> What's missing from your explanation is the set of bytes that can
> occur in such a sequence.
> 
> From the ECMA link above:
> 
>   OSC is used as the opening delimiter of a control string for
>   operating system use. The command string following may consist of a
>   sequence of bit combinations in the range 00/08 to 00/13 and 02/00
>   to 07/14.  The control string is closed by the terminating delimiter
>   STRING TERMINATOR (ST). The interpretation of the command string
>   depends on the relevant operating system
> 
> These "bit combinations" would be more usually described as bytes 0x08
> to 0x0d and 0x20 to 07e.  Other bytes are invalid (e.g. non-ASCII
> UTF-8).

Thanks for correcting my omission.  I was focussing more on describing
what Eli could expect grotty to emit than on undertaking an exegesis of
Egmont's OSC 8 specification per se.  If futher formalized, OSC 8 could
use some tightening up.

> If your only intention is to strip out and ignore such sequences, you
> can ignore the syntax involving semicolons and key-value pairs, and
> just skip over the sequence of permissible bytes.

Multiple examples of Egmont's, however, repeatedly show a pair of
semicolons between the "8" and the URL.  I interpreted this as
recommended practice, albeit not normative.

> > 5c.  _Optionally_, you can treat a BEL (C-g) as equivalent to a
> > string terminator.  This practice is outside of the ECMA-48
> > specification, but is sometimes produced by applications targeting
> > "color xterms" of the 1990s written by people who lacked access to,
> > or ignored, ECMA-48, and clunkily implemented SGR support.  I would
> > not support this practice for OSC 8; I know of nothing that produces
> > such ill-terminated sequences for its much newer convention.  I
> > wouldn't even mention it, except that I fear that some terminal
> > emulator developer who spends more effort on promotional activities
> > than on ensuring code quality will bring it up.
> 
> BEL (0x07) is also an invalid byte to occur within the sequence, so if
> not treated specially as a terminator, should be treated as invalid
> input (which would presumably terminate the OSC processing).

I agree, and if I were writing an OSC 8 interpreter, that's what I would
do.

> > Because the string terminator ends the escape sequence, the next
> > bytes you read will fall into one of the following exhaustive
> > categories.
> > 
> > 1.  the start of an SGR escape sequence (starts with ESC [);
> > 2.  the start of an OSC 8 escape sequence--if you are already within
> >     link text, the occurrence and therefore nesting of these is
> >     undefined, and I would ignore them;
> > 3.  Unicode Basic Latin code points minus DEL, plus LF, TAB, and FF,
> >     encoded in single bytes;[2] or
> > 4.  a UTF-8 multibyte charcter sequence (only if GNU troff's output
> >     directed grotty to read the description of the "utf8" device).
> 
> Presumably it's straightforward to process these OSC sequences without
> this list as this is just a list of other constructs that could appear
> in the input.

Yes.  I thought it would be helpful in the discussion context to offer
further guidance to Eli regarding what he can grotty to produce.

> Regarding point 2, you appear to be incorrect according to the github
> page link above:
> 
>   The feature was modeled after anchors on webpages. There are some
>   differences though, due to the nature of terminal emulation.
>   
>   An HTML page is supposed contain balanced and unnested pairs of <a
>   ...> and </a> tags. This is important in order to build up a DOM
>   tree. Terminal emulators don't have this concept. They are a state
>   machine, interpreting the data as it arrives in a stream.
>   
>   As such, in terminal emulators an OSC 8 escape sequence just changes
>   the hyperlink (or lack thereof) to the new value. It is perfectly
>   legal to switch from one hyperlink to another without explicitly
>   closing the first one. It is also perfectly legal to close a
>   hyperlink when it's not actually open (e.g. to make sure to clean up
>   after a potentially unclean exit of an application).
>   
>   You can practically think of the hyperlink as yet another attribute
>   that character cells have, similarly to the foreground and
>   background color, bold, italic, strikethrough etc. bits. It is
>   absolutely valid to switch from one color to another without
>   resetting to the default in between, or to reset to the default
>   multiple times. The same goes for hyperlinks.
> 
> https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda

I'd call it more of a deliberate omission.  As you quote above, HTML is
block-structured.  ECMA-48 is more stream-oriented.

With respect to what Eli can expect grotty to produce, he need not worry
about hyperlink nesting.  grotty complains if its input attempts it, and
"flattens" the link structure.

$ printf 'Of course it runs \\X"tty: link 
https://www.gnu.org/software/emacs"E\\X"tty: link 
https://example.com/"macs\\X"tty: link".\n.pl \\n[nl]u\n' | 
~/groff-HEAD/bin/nroff
grotty:<standard input>:27: warning: new hyperlink started without ending 
previous one; recovering
Of course it runs Emacs.

In that example, grotty arranges for:
"E" to link to https://www.gnu.org/software/emacs,
and "macs" to https://example.com/.

Regards,
Branden

[1] 
https://web.archive.org/web/20210701000000*/http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf

Attachment: signature.asc
Description: PGP signature

Reply via email to