> > in /mail/lib/pipeto.lib, the line > > > > sed '/^$/,$ s/^From / From /' >$TMP.msg > > > > needs to be replaced with a c program that does this > > conversion without coercing its input text into utf-8. > > > > russ > > unfortunately, i think the patch is on the wrong track. > sed isn't coercing it's input to utf-8. there's no active > conversion going on. plan 9 programs assume utf-8 input, > since plan 9 uses utf-8.
i said coerce, not convert. sed is treating its input as utf-8, like most plan 9 programs, but raw mail messages might be some other 8-bit ascii-compatible encoding. so the bytes that are not valid utf-8 sequences are getting mangled by the coercion into a Rune buffer. > i think a better solution to this is to convert the incoming > message to utf-8 first. there are likely more problems similar > to this one as plan 9 tools make valid assumptions that upas doesn't > honour. most plan 9 tools are used on the upas presentation of a mailbox, which *is* in utf-8. very few tools operate directly on the 8-bit mail message. pipeto.lib is one of the few, and even there it just works to get its input into an mbox and then invokes upas/fs. attempting to perform any conversion of the raw message is a mistake. you're almost guaranteed to lose some information, and with little to no benefit (thanks to everything using upas/fs to access mail). russ
