On Sun, Jan 14, 2024 at 12:45 PM Bradley Lucier <[email protected]> wrote:

char-generator does not use an SRFI 14 char-set. This means, for
> implementations that use Unicode, char-generator may generate characters
> that are not valid Unicode code points. I think this is the correct
> behavior, since invalid code points fulfill the char? predicate.
>

Let's be clear about the terminology.

1) The code points of Unicode are the integers in the range #x0 to
#x10FFFF, excluding the range #xD800 to #xDFFF.  Therefore, the
char-generator in the sample implementation must not generate #xD900, for
example, because that is not a Unicode code point.  This needs to be fixed.

2) In R6RS systems, each code point maps to a character using integer->char
and each character maps to a codepoint using char->integer.  Therefore, all
Unicode characters are supported.  Note that this includes unassigned code
points, whose semantics are unknown.

3) In R7RS systems, each supported Unicode character maps to a codepoint
and back, and each supported non-Unicode character (if any) maps to an
integer greater than #x10FFFF and back.  All R7RS systems must support the
codepoints #x0 to #x7F and may support whatever other characters they
want.  Existing R7RS systems AFAIK support all Unicode characters and no
non-Unicode characters.

Consequently, it is portable *in practice* to generate characters that
correspond to codepoints only.  However, SRFI 194's character generators
are initialized from a source string in order to make them guaranteed to be
portable as long as the string is a literal.


> First question: Why do "invalid code points fulfill the char?
> predicate"?  It does seem to be true, at least for Gambit, but is there
> some document somewhere that specifies this?
>

If Gambit supports #\xD900, it should be fixed not to do that.


> I guess I would propose that (char-generator) return only valid
> characters according to whatever the implementation supports.
>

Unfortunately, R7RS-small doesn't have a reliable way to ask, given an
exact integer, whether it can be mapped to a character using integer->char.

Reply via email to