Re: Bug reported regarding Unicode handling in email address

Ralph Corderoy Mon, 07 Jun 2021 06:55:05 -0700

Hi Ken,

> > It's early morning for me, and I'm still at least a liter of Diet
> > Mountain Dew away from being sufficiently caffeinated to be
> > positive, but that looks like "not totally correct, but a lot closer
> > than what we have now".
> > 
> > In particular, that will accept overlong and illegal utf-8
> > codepoints, and probably misbehaves in strange and unusual
> > non-ascii/non-utf-8 things like iso2022-jp.
>
> So, the DETAILS are complicated.


This is nmh.  :-)

> The address parser code is used for a lot of things.  The specific bug
> report was about a draft message that contained Cyrillic characters.
> We know what that character set was in THAT case, because it's a draft
> message and we can derive the locale from the environment or the nmh
> locale setting.  But if we are processing an email message then we
> don't easily know the character set.  In theory it should either be
> us-ascii or utf-8, but reality sometimes intrudes and it could be
> anything.

If it's an email then won't it be ASCII?

> I think really instead of using ctype macros, we should be using a
> specific set of macros tailored for email addresses.

Isn't the problem that one routine is being used to parse emails which
should comply with the RFCs and also draft emails where it's up to nmh
to decide the allowable format?  We should be parsing ASCII-encoded
fields for display in the user's locale with one routine and
locale-encoded fields for transmission as ASCII with a second routine.

-- 
Cheers, Ralph.

Re: Bug reported regarding Unicode handling in email address

Reply via email to