Hi Marvin, what do you think is faster ? Scanning the input or allocing more memory ? For European languages the length should be between 1 and 2 wcslen(src).
Also where does the +1 come from ? Last but not least I just wanted to mention there is a bug in the INPUT part as I use char instead of U8. Here is the correct code: INPUT T_WCHAR { // Alloc memory for wide char string. This could be a bit more // then necessary. Newz(0, $var, SvLEN($arg), wchar_t); U8* src = (U8*) SvPV_nolen($arg); wchar_t* dst = (wchar_t*) $var; if (SvUTF8($arg)) { // UTF8 to wide char mapping STRLEN len; while (*src) { *dst++ = utf8_to_uvuni(src, &len); src += len; } } else { // char to wide char mapping while (*src) { *dst++ = (wchar_t) *src++; } } *dst = 0; SAVEFREEPV($var); } Thomas. Quoting Marvin Humphrey <[EMAIL PROTECTED]>: > > On Nov 29, 2006, at 10:55 AM, Thomas Busch wrote: > > // Alloc memory for wide char string. This is clearly wider > > // then necessary in most cases but no choice. > > Newz(0, dst, 3 * wcslen(src), U8); > > I think you need to bump that allocation to 4 * wcslen(src) + 1, > otherwise you run the risk of a buffer overflow in the event that > your data has too many code points above the BMP. Alternately, you > can scan the input first and determine how much space you need to > allocate. > > > while (*src) { > > d = uvuni_to_utf8(d, *src++); > > } > > *d = 0; > > I assume that uvuni_to_utf8 handles invalid input safely. > > The crucial thing here is not to open a security hole. If a user can > supply input, assume that pathologically munged input is on its way. > Since this is typemap code, many functions are potentially affected. > > Marvin Humphrey > Rectangular Research > http://www.rectangular.com/ > >