Loic> Yes, and that's a very good reason for choosing UTF-8 as an
Loic> internal charset. However functions like strndup or strncmp and
Loic> in general string functions that require to move to the Nth
Loic> character have a problem with UTF-8 and alternatives functions
Loic> must be re-implemented.
The "N" argument to these functions refers to bytes, not characters.
You can still use these same functions; you just need a way to map the
character number to the byte number.
I wouldn't be averse to implementing a function like that. In fact it
would be very useful. What should it be called?
Loic> Same problem for functions like strchr since the char argument
Loic> must be a string and not a char for UTF-8 sequences that are
Loic> more than one char.
That's true. Ideally I suppose the char would be a unicode_char_t,
and internally we could convert and use strstr or something.
I'll add this to TODO.
Loic> There also is an issue regarding case transformation for
Loic> strcasecmp and others.
I'm also adding that to TODO.
Loic> Ok, I'll keep that in mind. I understand that the master CVS
Loic> site is gnome.org.
Yes.
Loic> Good. I've not been able to find the cannonical distribution of
Loic> this latest regexp package though. Does it exist ?
It does exist but offhand I don't know where to get it. A modified
version appears in Tcl; maybe it has a reference to the original.
Tom
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.