Hi Gavin, At 2026-02-01T11:50:36+0000, Gavin Smith wrote: > On Sun, Feb 01, 2026 at 05:18:19AM -0600, G. Branden Robinson wrote: > > I'm happy to explain, but beyond from Egmont's "gist" above, ECMA-48 > > is the controlling authority for the structure of these escape > > sequences. > > > > http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf > > That link didn't work for me.
For me, neither--now. > This one does: > > https://ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf Thanks! I'll fix this in groff's documentation. According to the Internet Archive's Wayback Machine, ECMA killed the old URL between 12 January and 25 February 2021.[1] I apparently added the URL to the grotty(1) man page on 16 January 2020. https://cgit.git.savannah.gnu.org/cgit/groff.git/commit/?id=4520668e9ec756ecd6486bc5ce937809e1b4f543 So, let it be known to all: if you need to kill a stable URL anywhere on the Internet but lack authority to do so, just get me to add it to groff documentation. It'll be dead in a year. > What's missing from your explanation is the set of bytes that can > occur in such a sequence. > > From the ECMA link above: > > OSC is used as the opening delimiter of a control string for > operating system use. The command string following may consist of a > sequence of bit combinations in the range 00/08 to 00/13 and 02/00 > to 07/14. The control string is closed by the terminating delimiter > STRING TERMINATOR (ST). The interpretation of the command string > depends on the relevant operating system > > These "bit combinations" would be more usually described as bytes 0x08 > to 0x0d and 0x20 to 07e. Other bytes are invalid (e.g. non-ASCII > UTF-8). Thanks for correcting my omission. I was focussing more on describing what Eli could expect grotty to emit than on undertaking an exegesis of Egmont's OSC 8 specification per se. If futher formalized, OSC 8 could use some tightening up. > If your only intention is to strip out and ignore such sequences, you > can ignore the syntax involving semicolons and key-value pairs, and > just skip over the sequence of permissible bytes. Multiple examples of Egmont's, however, repeatedly show a pair of semicolons between the "8" and the URL. I interpreted this as recommended practice, albeit not normative. > > 5c. _Optionally_, you can treat a BEL (C-g) as equivalent to a > > string terminator. This practice is outside of the ECMA-48 > > specification, but is sometimes produced by applications targeting > > "color xterms" of the 1990s written by people who lacked access to, > > or ignored, ECMA-48, and clunkily implemented SGR support. I would > > not support this practice for OSC 8; I know of nothing that produces > > such ill-terminated sequences for its much newer convention. I > > wouldn't even mention it, except that I fear that some terminal > > emulator developer who spends more effort on promotional activities > > than on ensuring code quality will bring it up. > > BEL (0x07) is also an invalid byte to occur within the sequence, so if > not treated specially as a terminator, should be treated as invalid > input (which would presumably terminate the OSC processing). I agree, and if I were writing an OSC 8 interpreter, that's what I would do. > > Because the string terminator ends the escape sequence, the next > > bytes you read will fall into one of the following exhaustive > > categories. > > > > 1. the start of an SGR escape sequence (starts with ESC [); > > 2. the start of an OSC 8 escape sequence--if you are already within > > link text, the occurrence and therefore nesting of these is > > undefined, and I would ignore them; > > 3. Unicode Basic Latin code points minus DEL, plus LF, TAB, and FF, > > encoded in single bytes;[2] or > > 4. a UTF-8 multibyte charcter sequence (only if GNU troff's output > > directed grotty to read the description of the "utf8" device). > > Presumably it's straightforward to process these OSC sequences without > this list as this is just a list of other constructs that could appear > in the input. Yes. I thought it would be helpful in the discussion context to offer further guidance to Eli regarding what he can grotty to produce. > Regarding point 2, you appear to be incorrect according to the github > page link above: > > The feature was modeled after anchors on webpages. There are some > differences though, due to the nature of terminal emulation. > > An HTML page is supposed contain balanced and unnested pairs of <a > ...> and </a> tags. This is important in order to build up a DOM > tree. Terminal emulators don't have this concept. They are a state > machine, interpreting the data as it arrives in a stream. > > As such, in terminal emulators an OSC 8 escape sequence just changes > the hyperlink (or lack thereof) to the new value. It is perfectly > legal to switch from one hyperlink to another without explicitly > closing the first one. It is also perfectly legal to close a > hyperlink when it's not actually open (e.g. to make sure to clean up > after a potentially unclean exit of an application). > > You can practically think of the hyperlink as yet another attribute > that character cells have, similarly to the foreground and > background color, bold, italic, strikethrough etc. bits. It is > absolutely valid to switch from one color to another without > resetting to the default in between, or to reset to the default > multiple times. The same goes for hyperlinks. > > https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda I'd call it more of a deliberate omission. As you quote above, HTML is block-structured. ECMA-48 is more stream-oriented. With respect to what Eli can expect grotty to produce, he need not worry about hyperlink nesting. grotty complains if its input attempts it, and "flattens" the link structure. $ printf 'Of course it runs \\X"tty: link https://www.gnu.org/software/emacs"E\\X"tty: link https://example.com/"macs\\X"tty: link".\n.pl \\n[nl]u\n' | ~/groff-HEAD/bin/nroff grotty:<standard input>:27: warning: new hyperlink started without ending previous one; recovering Of course it runs Emacs. In that example, grotty arranges for: "E" to link to https://www.gnu.org/software/emacs, and "macs" to https://example.com/. Regards, Branden [1] https://web.archive.org/web/20210701000000*/http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf
signature.asc
Description: PGP signature
