I have a compromise proposal, which could be implemented for 2.0.x:

We keep wide (UTF-32) stringbufs as-is, but we change narrow stringbufs
to UTF-8, along with a flag that indicates whether it is known to be
ASCII-only.

Applying string-ref or string-set! to a narrow stringbuf would upgrade
it to a wide stringbuf, unless it is known to be ASCII-only.  Better
yet, string-ref should do this only when the index is above a certain
threshold value, and string-set! should do this only for stringbufs
longer than a certain threshold length.

This would keep our accessors O(1), but also ensure that most stringbufs
are narrow.  This is important not only for optimal memory usage, but
also because it means we don't have to worry so much about optimizing
the narrow-wide cases: then we can handle those cases by widening or
narrowing to make them the same width, and then calling libunistring.

In the eventual common case, where string-ref and string-set! are rarely
called, almost all stringbufs would be narrow, so converting to UTF-8
becomes an O(1) operation.

What do you think?

    Mark

Reply via email to