Hi,
You may want to seriously consider doing normalization (which
eliminates or reduces false negatives in string matches).
The Unicode normalizations are defined in UAX 15, available at:
http://www.unicode.org/reports/tr15/
Also relevant is UTR 36 "Unicode Security Considerations" at:
http://www.unicode.org/reports/tr36/
The latter discusses the security issues inherent in using
Unicode (which has alternative ways of encoding many of the
characters).
Cheers,
- Ira
Ira McDonald (Musician / Software Architect)
Blue Roof Music / High North Inc
PO Box 221 Grand Marais, MI 49839
phone: +1-906-494-2434
email: [EMAIL PROTECTED]
> -----Original Message-----
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] Behalf Of [EMAIL PROTECTED]
> Sent: Tuesday, October 11, 2005 9:50 AM
> To: [email protected]
> Cc: [EMAIL PROTECTED]
> Subject: Re: Using utf-8 in an application
>
>
> >
> >
> >> Here are the questions.
> >>
> >> 1) In livido.h we #include <wchar.c>
> >
> >
> > You shouldnt need to include anything.
> >
> >
> >>
> >> 2) for getting the utf-8 string length in bytes, we use
> wcslen(). Is
> >> this
> >> the correct function ?
> >
> >
> > No, regular strlen will work fine for utf-8 strings length in bytes.
> >
> >
> >>
> >>
> >> 3) when a string is retrieved, we must add a utf-8
> terminating NULL to
> >> the
> >> end. How is this done ?
> >
> >
> > The same as you would with an ascii string- they both use a single 0
> > byte as a terminating null.
> >
> >
> >>
> >>
> >> 4) For testing purposes, I want to create a utf-8 string.
> Is there a
> >> simple way to convert a char *string to utf-8 ?
> >
> >
> > a char* string is a utf-8 string.
> >
> >
> > Unless you plan on doing fancy things such as normalization
> etc, you can
> > treat utf-8 strings
> > like a sequence of null terminated non-null bytes.
> >
> >
>
>
> Great !
> Thanks for your quick reply :-)
>
> Gabriel.
>
>
>
> --
> Linux-UTF8: i18n of Linux on all levels
> Archive: http://mail.nl.linux.org/linux-utf8/
>
--
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/linux-utf8/