Jacques Menu <[email protected]> writes:

> Hello David,
>
> Maybe this is totally stupid, but would it be meaningful to pick a
> Guile 2 version, fix the issues in string
> implementation and design, and freeze that fixed version for Lily’s
> own use, without depending on Andy Wingo’s work for some time?

That "fix" would consist in making Guile strings UTF-8 only.  Throw out
everything else.  Problem is that Scheme has in-place string
manipulations that don't work with variable-size characters.  It may be
that recent Scheme standards have tried to become more UTF-8 friendly,
no idea.

A beyond-LilyPond fix would turn the internal string coding into some
extension of UTF-8 like Emacs does.  That is actually also a
prerequisite of making something like Guilemacs ever take off.  Getting
Emacs' string implementation to its current quality took decades.

A few seminal points:

Code points for large characters are supported (at one point 32-bit
characters, but it may be reduced to the theoretic maximum with 4-byte
characters these days).  Out-of-sequence bytes from 128–255 are
represented with 2-byte sequences (overlong representations of 0–127 not
valid as UTF-8), valid UTF-8 is represented as itself.  That makes any
binary data representable with a blow-up factor of at most 2.

With Emacs, the details are in the various encoding itty-bitties: the
internal processing is comparatively straightforward.  Buffers are
addressed by character positions, not bytes.

The itty-bitty details mostly concern conversion into/out of the
internal UTF-8 though and don't have much of an impact on the normal
processing.

-- 
David Kastrup

Reply via email to