Hi,

You may want to seriously consider doing normalization (which
eliminates or reduces false negatives in string matches).

The Unicode normalizations are defined in UAX 15, available at:

    http://www.unicode.org/reports/tr15/

Also relevant is UTR 36 "Unicode Security Considerations" at:

    http://www.unicode.org/reports/tr36/

The latter discusses the security issues inherent in using
Unicode (which has alternative ways of encoding many of the
characters).

Cheers,
- Ira


Ira McDonald (Musician / Software Architect)
Blue Roof Music / High North Inc
PO Box 221  Grand Marais, MI  49839
phone: +1-906-494-2434
email: [EMAIL PROTECTED]

> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED]
> Sent: Tuesday, October 11, 2005 9:50 AM
> To: [email protected]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Using utf-8 in an application
> 
> 
> >
> >
> >> Here are the questions.
> >>
> >> 1) In livido.h we #include <wchar.c>
> >
> >
> > You shouldnt need to include anything.
> >
> >
> >>
> >> 2) for getting the utf-8 string length in bytes, we use 
> wcslen(). Is
> >> this
> >> the correct function ?
> >
> >
> > No, regular strlen will work fine for utf-8 strings length in bytes.
> >
> >
> >>
> >>
> >> 3) when a string is retrieved, we must add a utf-8 
> terminating NULL to
> >> the
> >> end. How is this done ?
> >
> >
> > The same as you would with an ascii string- they both use a single 0
> > byte as a terminating null.
> >
> >
> >>
> >>
> >> 4) For testing purposes, I want to create a utf-8 string. 
> Is there a
> >> simple way to convert a char *string to utf-8 ?
> >
> >
> > a char* string is a utf-8 string.
> >
> >
> > Unless you plan on doing fancy things such as normalization 
> etc, you can
> > treat utf-8 strings
> > like a sequence of null terminated non-null bytes.
> >
> >
> 
> 
> Great !
> Thanks for your quick reply :-)
> 
> Gabriel.
> 
> 
> 
> --
> Linux-UTF8:   i18n of Linux on all levels
> Archive:      http://mail.nl.linux.org/linux-utf8/
> 

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Reply via email to