Well, r30485 did not change that assumption -- as you can see, the code before _also_ simply assumed that 'output' was 'big enough'. And the few places in the code that called this function all allocated 'output' as strlen(input)+1, so in cases where utf8 toupper returns a longer string, the code was incorrect before r30485 in the same way --- only with the new API it might be more obvious that the caller is/was expected to allocate output (and that this might be asking a bit much).
Maybe we should just change this API to *return* an allocated string instead of passing 'output'? I don't quite understand why this API was written like this to begin with -- returning the uppercase string would seem more natural. If you change this, please also change the tolower function in the same way. Happy hacking! Christian On 10/31/2013 12:09 AM, LRN wrote: > On 30.10.2013 22:15, [email protected] wrote: >> Author: grothoff >> Date: 2013-10-30 19:15:48 +0100 (Wed, 30 Oct 2013) >> New Revision: 30485 > >> /** >> - * Convert the utf-8 input string to uppercase >> - * Output needs to be allocated appropriately >> + * Convert the utf-8 input string to uppercase. >> + * Output needs to be allocated appropriately. >> * >> * @param input input string >> * @param output output buffer >> */ >> void >> -GNUNET_STRINGS_utf8_toupper(const char* input, char** output) >> +GNUNET_STRINGS_utf8_toupper(const char *input, >> + char *output) >> { >> uint8_t *tmp_in; >> size_t len; > >> tmp_in = u8_toupper ((uint8_t*)input, strlen ((char *) input), >> NULL, UNINORM_NFD, NULL, &len); >> - memcpy(*output, tmp_in, len); >> - (*output)[len] = '\0'; >> - free(tmp_in); >> + memcpy (output, tmp_in, len); >> + output[len] = '\0'; >> + free (tmp_in); >> } > > u8_toupper allocates its output, then you copy it into the buffer that > user provided, using the length that u8_toupper reported (not the actual > length of the buffer). > > I'm not sure that this conversion always produces the output that has > the same length as the input (which is, AFAIU, what you're relying on), > not for all languages. > > The docs that i've found on UNINORM_NFD do not indicate (AFAICU) that > this is some kind of special transform that guarantees the same (or > less) number of bytes in the output. > > > _______________________________________________ > GNUnet-developers mailing list > [email protected] > https://lists.gnu.org/mailman/listinfo/gnunet-developers >
0x48426C7E.asc
Description: application/pgp-keys
signature.asc
Description: OpenPGP digital signature
_______________________________________________ GNUnet-developers mailing list [email protected] https://lists.gnu.org/mailman/listinfo/gnunet-developers
