Roland Mainz wrote: > Don Cragun wrote: ... ... ... >> The standard does not currently specify a way to count (multi-byte) >> characters even though this means tail output may start or end in the >> middle of a multi-byte character when using the -c option. > > Is this even possible to specify characters in this case ? AFAIK the > multibyte API doesn't have a way to seek into a random position in a > file and then find the start of the next multibyte character (it works > for UTF-8 but I am not sure for older encoding systems). > > ---- > > Bye, > Roland
Hi Roland, It depends on the codeset being used. To work correctly on codesets that don't self identify the first or last byte of a multi-byte character, you have to start at the first byte in the file, scan to the end and then search backwards after marking appropriate bytes along the way to get to the requested starting point. Note, however, that you have the same problem counting lines. It isn't any easier to find a <newline> character in a stream of bytes than it is to find any other character. - Don