Nicolas Williams wrote: > On Wed, Apr 22, 2009 at 11:16:09AM -0700, Don Cragun wrote: > > The standard does not currently specify a way to count (multi-byte) > > characters even though this means tail output may start or end in the > > middle of a multi-byte character when using the -c option. > > How... painful. Of course, for fixed width encodings > and UTF-8/16 it
AFAIK Solaris doesn't have an UTF-16 based locale and AFAIK UTF-16 can't be supported by the POSIX multibyte API (at least I never saw and and can't imagine how it should work) ... > should be possible to automatically adjust the -c argument value so it > starts at the start of a character, but that would require another > argument. Following the precedent of "wc" we could use "-C" (uppercase 'C') for this purpose... ... but I am not sure whether it is possible for all encodings (e.g. Shift-JIS, GBK, EUC etc.) to properly detect the start of a multibyte character (anyone remeber ISO-2022 ? =:-) ). > Fortunately I bet tail -c ... is fairly uncommon. There are several consumes in Solaris which use it. ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;)