Hi, Aleksander Morgado wrote: > Small questions regarding casefolding in UTF-8: > > — Function: uint8_t * u8_casefold (const uint8_t *s, size_t n, const > char *iso639_language, uninorm_t nf, uint8_t *resultbuf, size_t > *lengthp) > > What if the resultbuf passed doesn't have enough space for the > case-folded and normalized string?
This is documented at the end of the doc section "Conventions": <http://www.gnu.org/software/libunistring/manual/html_node/Conventions.html> "Functions returning a string result take a (resultbuf, lengthp) argument pair. If resultbuf is not NULL and the result fits into *lengthp units, it is put in resultbuf, and resultbuf is returned. Otherwise, a freshly allocated string is returned. In both cases, *lengthp is set to the length (number of units) of the returned string. In case of error, NULL is returned and errno is set." > And, if NFC normalization desired in the output, would it be safe to say > that the output length will be less or equal than the input length? No, it is not. The file tests/test-u8-casefold.c has a couple of examples that show a case-folded string can be longer than the original string. In summary, these Unicode aware string manipulations have so complex details that the classical assumptions all fail. Bruno
