Roland Mainz wrote:
> Don Cragun wrote:
  ... ... ...
>> The standard does not currently specify a way to count (multi-byte)
>> characters even though this means tail output may start or end in the
>> middle of a multi-byte character when using the -c option.
> 
> Is this even possible to specify characters in this case ? AFAIK the
> multibyte API doesn't have a way to seek into a random position in a
> file and then find the start of the next multibyte character (it works
> for UTF-8 but I am not sure for older encoding systems).
> 
> ----
> 
> Bye,
> Roland

Hi Roland,
It depends on the codeset being used.  To work correctly on codesets
that don't self identify the first or last byte of a multi-byte
character, you have to start at the first byte in the file, scan to the
end and then search backwards after marking appropriate bytes along the
way to get to the requested starting point.

Note, however, that you have the same problem counting lines.  It isn't
any easier to find a <newline> character in a stream of bytes than it is
to find any other character.

  - Don

Reply via email to