Jacques Menu <[email protected]> writes: > Hello David, > > Maybe this is totally stupid, but would it be meaningful to pick a > Guile 2 version, fix the issues in string > implementation and design, and freeze that fixed version for Lily’s > own use, without depending on Andy Wingo’s work for some time?
That "fix" would consist in making Guile strings UTF-8 only. Throw out everything else. Problem is that Scheme has in-place string manipulations that don't work with variable-size characters. It may be that recent Scheme standards have tried to become more UTF-8 friendly, no idea. A beyond-LilyPond fix would turn the internal string coding into some extension of UTF-8 like Emacs does. That is actually also a prerequisite of making something like Guilemacs ever take off. Getting Emacs' string implementation to its current quality took decades. A few seminal points: Code points for large characters are supported (at one point 32-bit characters, but it may be reduced to the theoretic maximum with 4-byte characters these days). Out-of-sequence bytes from 128–255 are represented with 2-byte sequences (overlong representations of 0–127 not valid as UTF-8), valid UTF-8 is represented as itself. That makes any binary data representable with a blow-up factor of at most 2. With Emacs, the details are in the various encoding itty-bitties: the internal processing is comparatively straightforward. Buffers are addressed by character positions, not bytes. The itty-bitty details mostly concern conversion into/out of the internal UTF-8 though and don't have much of an impact on the normal processing. -- David Kastrup
