Mattias Engdegård <[email protected]> writes:

> 29 aug. 2025 kl. 03.39 skrev Daniel Mendler <[email protected]>:
>
>> Yes, org-habit with Unicode characters works well again with your patch.
>
> Thank you, now pushed to Emacs master. I trust the Org people to sync as 
> needed.

Thank you!

>>> (By the way, did the old code work if you used the glyph `·`?)
>> 
>> If by old code you mean the code before your patch, but without the aset
>> string resizing - it works somewhat. The strings are modified
>> successfully, but the end result in the agenda buffer looks broken, with
>> some \257 characters.
>
> Exactly, and this is not an uncommon bug in string-mutating code: `aset` on
> unibyte strings 'worked' for ASCII and Unicode above 255, but not for 
> 128..255.
> Thus the org-habit code was basically always broken in this sense. The recent
> mutation reform is an attempt to straighten things up.

Well, but it worked before with your string resizing code?

> In fact, mutating strings by replacing one character with another doesn't 
> really
> make much sense with Unicode. Not only do we have variable-width encodings
> (UTF-8) but what a user sees as a character can be multiple unicode codes
> (scalar values). Thus if org-habit were written today, I'd recommend
> org-habit-today-glyph etc to be strings, not characters. (That way arbitrary
> emojis would work, but I suppose that isn't much of a feature.)

According to my understand, one can still define sensible indices and
access functions. Either use code point or graphemes as units. But access
won't be O(1) if the string is stored in bytes as the underlying unit.
Accessing UTF-8 strings via byte indices indeed makes no sense, like one
might do in C when mutating a char[] array.

>> Agree. Buffers are a bit expensive though. I would only use them if more
>> complex transformations are needed.
>
> Yes, piecing strings together is usually both faster and cleaner than using 
> buffers.
>
>> Yes, it is a good direction. I am all for making Elisp strings immutable
>> given that we have vectors and buffers. Emacs even crashes when using
>> aset on symbol-names, which I had reported a while ago.
>
> Ugh, yes, that's an embarrassment.

I think there could be some potential to introduce frozen (immutable)
objects, in the light of potential garbage collector optimization.
Frozen symbol names would resolve the crash issue. Ruby went such a
route with its string literals - they started out mutable, became
optionally frozen via a magic frozen_string_literal comment (similar to
our lexical-binding cookie) and are frozen by default in more recent
language versions.

Daniel

Reply via email to