On Sun, Jan 14, 2024 at 12:45 PM Bradley Lucier <[email protected]> wrote:
char-generator does not use an SRFI 14 char-set. This means, for > implementations that use Unicode, char-generator may generate characters > that are not valid Unicode code points. I think this is the correct > behavior, since invalid code points fulfill the char? predicate. > Let's be clear about the terminology. 1) The code points of Unicode are the integers in the range #x0 to #x10FFFF, excluding the range #xD800 to #xDFFF. Therefore, the char-generator in the sample implementation must not generate #xD900, for example, because that is not a Unicode code point. This needs to be fixed. 2) In R6RS systems, each code point maps to a character using integer->char and each character maps to a codepoint using char->integer. Therefore, all Unicode characters are supported. Note that this includes unassigned code points, whose semantics are unknown. 3) In R7RS systems, each supported Unicode character maps to a codepoint and back, and each supported non-Unicode character (if any) maps to an integer greater than #x10FFFF and back. All R7RS systems must support the codepoints #x0 to #x7F and may support whatever other characters they want. Existing R7RS systems AFAIK support all Unicode characters and no non-Unicode characters. Consequently, it is portable *in practice* to generate characters that correspond to codepoints only. However, SRFI 194's character generators are initialized from a source string in order to make them guaranteed to be portable as long as the string is a literal. > First question: Why do "invalid code points fulfill the char? > predicate"? It does seem to be true, at least for Gambit, but is there > some document somewhere that specifies this? > If Gambit supports #\xD900, it should be fixed not to do that. > I guess I would propose that (char-generator) return only valid > characters according to whatever the implementation supports. > Unfortunately, R7RS-small doesn't have a reliable way to ask, given an exact integer, whether it can be mapped to a character using integer->char.
