On Sun, Feb 01, 2026 at 05:18:19AM -0600, G. Branden Robinson wrote:
> I'm happy to explain, but beyond from Egmont's "gist" above, ECMA-48 is
> the controlling authority for the structure of these escape sequences.
> 
> http://www.ecma-international.org/publications/files/ECMA-ST/Ecma-048.pdf

That link didn't work for me.  This one does:

https://ecma-international.org/wp-content/uploads/ECMA-48_5th_edition_june_1991.pdf


> 
> If there were a semantic convention for "OSC 7" and/or "OSC 8", we'd
> expect them to follow a similar format as the foregoing.
> 
> You can expect to see the following:
> 
> 1.  ESC
> 2.  ]
> 
> These bytes select an "operating system command (OSC)".
> 
> 3.  8
> 
> This byte selects the semantics the operating system command will use.
> Only convention governs these.  I followed Egmont's proposed spec,
> linked above, as closely as I understood it.
> 
> 4.  ;
> 
> The semicolon begins, and separates, variable-length data items.  I
> would add subsequent non-semicolon characters to a queue.

What's missing from your explanation is the set of bytes that can
occur in such a sequence.

>From the ECMA link above:

  OSC is used as the opening delimiter of a control string for operating
  system use. The command string following may consist of a sequence
  of bit combinations in the range 00/08 to 00/13 and 02/00 to 07/14.
  The control string is closed by the terminating delimiter STRING
  TERMINATOR (ST). The interpretation of the command string depends on
  the relevant operating system

These "bit combinations" would be more usually described as bytes 0x08
to 0x0d and 0x20 to 07e.  Other bytes are invalid (e.g. non-ASCII UTF-8).

If your only intention is to strip out and ignore such sequences, you
can ignore the syntax involving semicolons and key-value pairs, and just
skip over the sequence of permissible bytes.

> 5c.  _Optionally_, you can treat a BEL (C-g) as equivalent to a string
> terminator.  This practice is outside of the ECMA-48 specification, but
> is sometimes produced by applications targeting "color xterms" of the
> 1990s written by people who lacked access to, or ignored, ECMA-48, and
> clunkily implemented SGR support.  I would not support this practice for
> OSC 8; I know of nothing that produces such ill-terminated sequences for
> its much newer convention.  I wouldn't even mention it, except that I
> fear that some terminal emulator developer who spends more effort on
> promotional activities than on ensuring code quality will bring it up.

BEL (0x07) is also an invalid byte to occur within the sequence, so if
not treated specially as a terminator, should be treated as invalid input
(which would presumably terminate the OSC processing).

> Because the string terminator ends the escape sequence, the next bytes
> you read will fall into one of the following exhaustive categories.
> 
> 1.  the start of an SGR escape sequence (starts with ESC [);
> 2.  the start of an OSC 8 escape sequence--if you are already within
>     link text, the occurrence and therefore nesting of these is
>     undefined, and I would ignore them;
> 3.  Unicode Basic Latin code points minus DEL, plus LF, TAB, and FF,
>     encoded in single bytes;[2] or
> 4.  a UTF-8 multibyte charcter sequence (only if GNU troff's output
>     directed grotty to read the description of the "utf8" device).

Presumably it's straightforward to process these OSC sequences without
this list as this is just a list of other constructs that could appear
in the input.

Regarding point 2, you appear to be incorrect according to the github
page link above:

  The feature was modeled after anchors on webpages. There are some
  differences though, due to the nature of terminal emulation.
  
  An HTML page is supposed contain balanced and unnested pairs of <a ...>
  and </a> tags. This is important in order to build up a DOM tree. Terminal
  emulators don't have this concept. They are a state machine, interpreting
  the data as it arrives in a stream.
  
  As such, in terminal emulators an OSC 8 escape sequence just changes
  the hyperlink (or lack thereof) to the new value. It is perfectly legal
  to switch from one hyperlink to another without explicitly closing the
  first one. It is also perfectly legal to close a hyperlink when it's
  not actually open (e.g. to make sure to clean up after a potentially
  unclean exit of an application).
  
  You can practically think of the hyperlink as yet another attribute that
  character cells have, similarly to the foreground and background color,
  bold, italic, strikethrough etc. bits. It is absolutely valid to switch
  from one color to another without resetting to the default in between,
  or to reset to the default multiple times. The same goes for hyperlinks.

https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda

Reply via email to