>> The address parser code is used for a lot of things. The specific bug >> report was about a draft message that contained Cyrillic characters. >> We know what that character set was in THAT case, because it's a draft >> message and we can derive the locale from the environment or the nmh >> locale setting. But if we are processing an email message then we >> don't easily know the character set. In theory it should either be >> us-ascii or utf-8, but reality sometimes intrudes and it could be >> anything. > >If it's an email then won't it be ASCII?
Boy, you're out of the loop! Check out RFC 6532. >> I think really instead of using ctype macros, we should be using a >> specific set of macros tailored for email addresses. > >Isn't the problem that one routine is being used to parse emails which >should comply with the RFCs and also draft emails where it's up to nmh >to decide the allowable format? We should be parsing ASCII-encoded >fields for display in the user's locale with one routine and >locale-encoded fields for transmission as ASCII with a second routine. I mean ... yes? Like many things there's a lot of overloading (see: using email header parsing routines for config files). But I think in practice as long as we don't interpret non-ASCII bytes as "spaces" we'll be fine. Like I said, really, for parsing an email header we really shouldn't be using ctype macros AT ALL but email-specific macros. --Ken