Re: Bug reported regarding Unicode handling in email address

Ken Hornstein Mon, 07 Jun 2021 08:09:12 -0700

>> The address parser code is used for a lot of things.  The specific bug
>> report was about a draft message that contained Cyrillic characters.
>> We know what that character set was in THAT case, because it's a draft
>> message and we can derive the locale from the environment or the nmh
>> locale setting.  But if we are processing an email message then we
>> don't easily know the character set.  In theory it should either be
>> us-ascii or utf-8, but reality sometimes intrudes and it could be
>> anything.
>
>If it's an email then won't it be ASCII?


Boy, you're out of the loop!  Check out RFC 6532.

>> I think really instead of using ctype macros, we should be using a
>> specific set of macros tailored for email addresses.
>
>Isn't the problem that one routine is being used to parse emails which
>should comply with the RFCs and also draft emails where it's up to nmh
>to decide the allowable format?  We should be parsing ASCII-encoded
>fields for display in the user's locale with one routine and
>locale-encoded fields for transmission as ASCII with a second routine.

I mean ... yes?  Like many things there's a lot of overloading (see:
using email header parsing routines for config files).  But I think
in practice as long as we don't interpret non-ASCII bytes as "spaces"
we'll be fine.  Like I said, really, for parsing an email header we really
shouldn't be using ctype macros AT ALL but email-specific macros.

--Ken

Re: Bug reported regarding Unicode handling in email address

Reply via email to