Re: Counting "characters".

Michael B . Allen Tue, 02 Apr 2002 23:56:47 -0800

On Tue, 2 Apr 2002 17:43:27 -0500
Glenn Maynard <[EMAIL PROTECTED]> wrote:


> >   mbslen   counts the number of characters where a "character" is
> >            something I still need to define.
> 
> And which definition is useful is very dependent on what you need it
> for.  I'd suggest figuring out the different uses you'd expect, and
> defining functions based on that.  (Defining a function and then finding
> uses for it is backwards.)
> 
> I'm assuming you don't have a specific application in mind, since you
> didn't answer Markus's question.

Ok, here's an example. The Document Object Model W3C spec describes some
'CharacterData' methods:

  http://www.w3.org/TR/1998/REC-DOM-Level-1-19981001/level-one-core.html#ID-FF21A306

My C implementation of this spec has functions for these methods like:

  DOM_String *DOM_CharacterData_substringData(DOM_CharacterData *data, int offset, int 
count);
  void DOM_CharacterData_deleteData(DOM_CharacterData *data, int offset, int count);

These offset and count parameters are described like 'The number of
characters to extract' or 'The character offset at which to insert'
etc. THe DOM API is one of these XML peripherals and so the 'Char'
type ultimately defined in the XML spec here:

  http://www.w3.org/TR/REC-xml#charsets

Which at one point has an actual "definition":

  [Definition: A character is an atomic unit of text as specified by
  ISO/IEC 10646 [ISO/IEC 10646] (see also [ISO/IEC 10646-2000]).

But these XML specs are unavoidably bound to the Java language so I think
Java's substring, charAt, and indexOf methods have a lot of influence
here.

I guess I should lookup 'atomic unit of text' in the ISO-10646 doc. That
sounds interesting.

Thanks,
Mike

-- 
May The Source be with you.

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/

Re: Counting "characters".

Reply via email to