Good morning Jonathan,

>3. Descriptions say they can encode ASCII only. Sorry, but this is nonsense. 
>Full unicode support via UTF8 should be supported.

I generally agree, but caution must be warned here.  In particular, we should 
be precise, which variant of UTF8.

Presumably, a naive implementation, that specially treats 0 bytes (as would 
happen if the implementation were naively written in C or C++, where by 
default, strings are terminated by a 0 byte), should work correctly without 
having to particularly care, if the encoding is UTF8 or plain 7-bit ASCII.  
This then leads to the use of so-called Modified UTF8 as used by Java in its 
native interface: embedded null characters are encoded as extralong 3-byte UTF8 
sequences, which are normally invalid in UTF8, but which naive treatment by C 
and C++ leads to (mostly) correct behavior.  Should we use Modifed UTF8 or 
simply disallow null characters? (Use of ASCII does not avoid this, but ASCII 
has no alternative to null characters and the standard C string terminating 
byte 0).

In addition, pulling in UTF8 brings in the issue, of Unicode normalization.  
Multiple different byte-sequences in UTF8 may lead to the same sequence of 
human-readable glyphs. Specifying ASCII avoids this issue.  Should we specify 
some Unicode normalization, and should GUI at least try to impose this Unicode 
normalization (even if backends/daemons simply ignore the description and hence 
any normalization issues)?

Regards,
ZmnSCPxj
_______________________________________________
Lightning-dev mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Reply via email to