Mattias Engdegård <mattias.engdeg...@gmail.com> writes:

> 29 aug. 2025 kl. 12.37 skrev Daniel Mendler <m...@daniel-mendler.de>:
>
>>> Exactly, and this is not an uncommon bug in string-mutating code: `aset` on
>>> unibyte strings 'worked' for ASCII and Unicode above 255, but not for 
>>> 128..255.
>>> Thus the org-habit code was basically always broken in this sense. The 
>>> recent
>>> mutation reform is an attempt to straighten things up.
>> 
>> Well, but it worked before with your string resizing code?
>
> No, you always got the \257 raw byte instead of an actual `·`.
> (Unless, maybe, your buffer were encoded in latin-1.)

Well, in the agenda buffer the habit graph looked correctly if I used
`·`, but only before the string resizing. I haven't configured Latin-1.
It should be all Unicode. I am not sure what was going on.

>> According to my understand, one can still define sensible indices and
>> access functions. Either use code point or graphemes as units. But access
>> won't be O(1) if the string is stored in bytes as the underlying unit.
>> Accessing UTF-8 strings via byte indices indeed makes no sense, like one
>> might do in C when mutating a char[] array.
>
> Actually byte indexing would make sense but not for accessing
> individual bytes. Ideally the index wouldn't be a plain number but a
> distinct Lisp type. Such indices could be obtained from iterating,
> searching and pattern matching, and used to extract substrings or
> individual code points.

Sure, an opaque iterator type is another reasonable alternative, but
maybe more if a language implementation is started from scratch. It
might not integrate so well into Elisp.

> Whether it's a sufficient improvement to merit a parallel string API
> in addition to the position-based one is a different matter.
>
>> I think there could be some potential to introduce frozen (immutable)
>> objects, in the light of potential garbage collector optimization.
>
> Yes, it's been discussed. Specifically making some strings immutable
> would indeed be useful but there are quite a few technicalities here.

What about other objects? Do you see runtime benefits there, besides
preventing bugs, if we can enforce immutability?

>> Frozen symbol names would resolve the crash issue. Ruby went such a
>> route with its string literals - they started out mutable, became
>> optionally frozen via a magic frozen_string_literal comment (similar to
>> our lexical-binding cookie) and are frozen by default in more recent
>> language versions.
>
> Actually string literals have long been considered de-facto immutable
> in Elisp although this isn't actively enforced. The byte compiler
> warns for some simple cases.

Personally I have never used string mutation in Elisp, so I can only
agree. :)

Daniel

Reply via email to