Re: T_WCHAR final

Thomas Busch Thu, 30 Nov 2006 07:39:14 -0800

Hi Marvin,

what do you think is faster ? Scanning the input or
allocing more memory ? For European languages
the length should be between 1 and 2 wcslen(src).


Also where does the +1 come from ?

Last but not least I just wanted to mention there is
a bug in the INPUT part as I use char instead of U8.

Here is the correct code:

INPUT
T_WCHAR
        {
          // Alloc memory for wide char string.  This could be a bit more
          // then necessary.
          Newz(0, $var, SvLEN($arg), wchar_t);

          U8* src = (U8*) SvPV_nolen($arg);
          wchar_t* dst = (wchar_t*) $var;

          if (SvUTF8($arg)) {
            // UTF8 to wide char mapping
            STRLEN len;
            while (*src) {
              *dst++ = utf8_to_uvuni(src, &len);
              src += len;
            }
          } else {
            // char to wide char mapping
            while (*src) {
              *dst++ = (wchar_t) *src++;
            }
          }
          *dst = 0;
          SAVEFREEPV($var);
        }


Thomas.

Quoting Marvin Humphrey <[EMAIL PROTECTED]>:

> 
> On Nov 29, 2006, at 10:55 AM, Thomas Busch wrote:
> >           // Alloc memory for wide char string.  This is clearly wider
> >           // then necessary in most cases but no choice.
> >           Newz(0, dst, 3 * wcslen(src), U8);
> 
> I think you need to bump that allocation to 4 * wcslen(src) + 1,  
> otherwise you run the risk of a buffer overflow in the event that  
> your data has too many code points above the BMP.  Alternately, you  
> can scan the input first and determine how much space you need to  
> allocate.
> 
> >           while (*src) {
> >             d = uvuni_to_utf8(d, *src++);
> >           }
> >           *d = 0;
> 
> I assume that uvuni_to_utf8 handles invalid input safely.
> 
> The crucial thing here is not to open a security hole.  If a user can  
> supply input, assume that pathologically munged input is on its way.   
> Since this is typemap code, many functions are potentially affected.
> 
> Marvin Humphrey
> Rectangular Research
> http://www.rectangular.com/
> 
>

Re: T_WCHAR final

Reply via email to