On 04/07/2022 7:18 AM, Ola Fosheim Grøstad wrote:
I hardly ever use anything outside UTF-8, and if I do then I use a well
tested unicode library as it has to be correct and up to date to be
useful. The utility of going beyond UTF-8 seems to be limited:
https://en.wikipedia.org/wiki/UTF-32#Analysis
I have just finished implementing string normalization which is based
around UTF-32.
It is required for string equivalent comparisons (which is what you
should be doing in a LOT more cases! Anything user provided when
compared should be normalized first.