Good morning Jonathan,
>3. Descriptions say they can encode ASCII only. Sorry, but this is nonsense.
>Full unicode support via UTF8 should be supported.
I generally agree, but caution must be warned here. In particular, we should
be precise, which variant of UTF8.
Presumably, a naive implementation, that specially treats 0 bytes (as would
happen if the implementation were naively written in C or C++, where by
default, strings are terminated by a 0 byte), should work correctly without
having to particularly care, if the encoding is UTF8 or plain 7-bit ASCII.
This then leads to the use of so-called Modified UTF8 as used by Java in its
native interface: embedded null characters are encoded as extralong 3-byte UTF8
sequences, which are normally invalid in UTF8, but which naive treatment by C
and C++ leads to (mostly) correct behavior. Should we use Modifed UTF8 or
simply disallow null characters? (Use of ASCII does not avoid this, but ASCII
has no alternative to null characters and the standard C string terminating
byte 0).
In addition, pulling in UTF8 brings in the issue, of Unicode normalization.
Multiple different byte-sequences in UTF8 may lead to the same sequence of
human-readable glyphs. Specifying ASCII avoids this issue. Should we specify
some Unicode normalization, and should GUI at least try to impose this Unicode
normalization (even if backends/daemons simply ignore the description and hence
any normalization issues)?
Regards,
ZmnSCPxj
_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev