29 aug. 2025 kl. 12.37 skrev Daniel Mendler <[email protected]>:
>> Exactly, and this is not an uncommon bug in string-mutating code: `aset` on >> unibyte strings 'worked' for ASCII and Unicode above 255, but not for >> 128..255. >> Thus the org-habit code was basically always broken in this sense. The recent >> mutation reform is an attempt to straighten things up. > > Well, but it worked before with your string resizing code? No, you always got the \257 raw byte instead of an actual `·`. (Unless, maybe, your buffer were encoded in latin-1.) > According to my understand, one can still define sensible indices and > access functions. Either use code point or graphemes as units. But access > won't be O(1) if the string is stored in bytes as the underlying unit. > Accessing UTF-8 strings via byte indices indeed makes no sense, like one > might do in C when mutating a char[] array. Actually byte indexing would make sense but not for accessing individual bytes. Ideally the index wouldn't be a plain number but a distinct Lisp type. Such indices could be obtained from iterating, searching and pattern matching, and used to extract substrings or individual code points. Whether it's a sufficient improvement to merit a parallel string API in addition to the position-based one is a different matter. > I think there could be some potential to introduce frozen (immutable) > objects, in the light of potential garbage collector optimization. Yes, it's been discussed. Specifically making some strings immutable would indeed be useful but there are quite a few technicalities here. > Frozen symbol names would resolve the crash issue. Ruby went such a > route with its string literals - they started out mutable, became > optionally frozen via a magic frozen_string_literal comment (similar to > our lexical-binding cookie) and are frozen by default in more recent > language versions. Actually string literals have long been considered de-facto immutable in Elisp although this isn't actively enforced. The byte compiler warns for some simple cases.
