On Tue, 2 Apr 2002 17:43:27 -0500 Glenn Maynard <[EMAIL PROTECTED]> wrote:
> > mbslen counts the number of characters where a "character" is > > something I still need to define. > > And which definition is useful is very dependent on what you need it > for. I'd suggest figuring out the different uses you'd expect, and > defining functions based on that. (Defining a function and then finding > uses for it is backwards.) > > I'm assuming you don't have a specific application in mind, since you > didn't answer Markus's question. Ok, here's an example. The Document Object Model W3C spec describes some 'CharacterData' methods: http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/level-one-core.html#ID-FF21A306 My C implementation of this spec has functions for these methods like: DOM_String *DOM_CharacterData_substringData(DOM_CharacterData *data, int offset, int count); void DOM_CharacterData_deleteData(DOM_CharacterData *data, int offset, int count); These offset and count parameters are described like 'The number of characters to extract' or 'The character offset at which to insert' etc. THe DOM API is one of these XML peripherals and so the 'Char' type ultimately defined in the XML spec here: http://www.w3.org/TR/REC-xml#charsets Which at one point has an actual "definition": [Definition: A character is an atomic unit of text as specified by ISO/IEC 10646 [ISO/IEC 10646] (see also [ISO/IEC 10646-2000]). But these XML specs are unavoidably bound to the Java language so I think Java's substring, charAt, and indexOf methods have a lot of influence here. I guess I should lookup 'atomic unit of text' in the ISO-10646 doc. That sounds interesting. Thanks, Mike -- May The Source be with you. -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/
