Paul Michel wrote on 2002-01-09 14:37 UTC: > But strtok() for instance does not handle utf-8 > data properly. Is this also in the standards? Reading > at the two urls below, I could not see where it was > explained that strcoll() does and strtok() does not... > > > >See > http://mail.nl.linux.org/linux-utf8/2001-12/msg00042.html > and > >http://www.opengroup.org/onlinepubs/007908799/xsh/strcoll.html
Well, just read the standard, which unambiguously contains all required information and is freely available online: http://www.opengroup.org/onlinepubs/007908799/xsh/strtok.html "A sequence of calls to strtok() breaks the string pointed to by s1 into a sequence of tokens, each of which is delimited by a byte from the string pointed to by s2." ^^^^ The meaning of the terms byte and character should be obvious, even in UTF-8. The only documentation that doesn't make this distinction very clear yet is the glibc manual, so feel free to volunteer and fix that one as well. Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: <http://www.cl.cam.ac.uk/~mgk25/> -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
