Re: strcoll for utf-8

Markus Kuhn Wed, 09 Jan 2002 09:58:30 -0800

Paul Michel wrote on 2002-01-09 14:37 UTC:
> But strtok() for instance does not handle utf-8
> data properly. Is this also in the standards? Reading
> at the two urls below, I could not see where it was
> explained that strcoll() does and strtok() does not...
>  
> 
> >See
> http://mail.nl.linux.org/linux-utf8/2001-12/msg00042.html
> and
> >http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html


Well, just read the standard, which unambiguously contains all required
information and is freely available online:

http://www.opengroup.org/onlinepubs/007908799/xsh/strtok.html

  "A sequence of calls to strtok() breaks the string pointed to by s1 into
  a sequence of tokens, each of which is delimited by a byte from the
  string pointed to by s2."                             ^^^^

The meaning of the terms byte and character should be obvious, even in
UTF-8. The only documentation that doesn't make this distinction very
clear yet is the glibc manual, so feel free to volunteer and fix that
one as well.

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: <http://www.cl.cam.ac.uk/~mgk25/>

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: strcoll for utf-8

Reply via email to