>Should nmh try to get out in front with email address >internationalzation (EAI)? See resources below.
I've thought about what it would take. >From the MUA perspective, IIUC, it relies on native support on the >host to handle unencoded UTF-8 addresses. Would nmh support just be a >matter of 1) not encoding addresses (controlled by a switch) in >outgoing messages and 2) when showing a message, indicating that an >address couldn't be displayed? I think it's slightly more complicated than that (see below). >Does anyone have experience using it? Gmail supports it, according >to the article below. I think the lack of people with such an address means it's pretty uncommon still, right? Lyndon writes later: >Since we require a Posix environment, that means utf8 locale support must >be in place, thus all the OS bits are there waiting to be used. > >But to do this properly we really need to overhaul the code base to >process everything internally as utf8. That's not a trivial task, but we >have to do it, sooner or later. Here are my unformed thoughts: - It's not so easy to deal with characters that aren't in your native locale using the POSIX API; xlocale make this easier, but it's a pain. - A super-brief scan suggests to me that SMTPUTF8 support is not widespread at this point. But that will no doubt change. - Right now our address parser will reject stuff that contains 8-bit characters; we need to fix that. In fact, we need to throw out that address parser and get a new one; I made some progress on that using flex and bison. - It's unclear to me how much UTF-8 verification a MUA is supposed to deal with; are we, for example, supposed to check for overlong UTF-8 encodings? Valid UTF-8 sequences? - I do not believe we have to process everything internally as UTF-8, but I could be persuaded I'm wrong. The real kicker is the format engine; right now we sort-of cheat a lot. %(decode) basically does a one-stop decoding and conversion to the native character set. This has a lot of advantages, but also means we need to sit down and decide what the format engine is really supposed to be working on; for example, is the format engine supposed to be dealing with strings pre or post RFC-2047 decoding? - SMTPUTF8 looks relatively straightforward to implement, at least. - I would rather not make ICU or IDN a build requirement, but it may be unavoidable. --Ken _______________________________________________ Nmh-workers mailing list [email protected] https://lists.nongnu.org/mailman/listinfo/nmh-workers
