ah yes, the dreaded partial rune problem. lots of programs
must cope with this issue.
-rob
On 8/31/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Hello,
>
> tcs both for plan 9 and for unix has a bug in reading utf text.
> that comes from:
> utf_in(int fd, long *notused, struct convert *out){
> char buf[N];
> ...
> while((n = read(fd, buf+tot, N-tot)) >= 0){
> ...
> }
>
> in utf.c
>
> N is assigned to be 10000 in hdr.h
>
> if you set N to 10, you will find the problem more clearly:
> tcs cannot handle correctly utf character boundary.
>
> for example, assume a.txt have the content:
> aaaaaaaこの
>
> term% xd -c a.txt
> 0000000 a a a a a a a e3 81 93 e3 81 ae \n
> 000000e
>
> tcs can handle this text because N=10 is just uft boundary
> but tcs fails if 'a' are 6 or 8 ...
>
> tcs is very important for me.
> Who maintains tcs ?
> I might help debugging.
>
> Kenji Arisawa
>
>