On Mon, Sep 26, 2016 at 06:22:11PM -0700, Junio C Hamano wrote:
> Even though latin-1 is still seen in e-mail headers, some platforms
> only install ISO-8859-1. "iconv -f ISO-8859-1" succeeds, while
> "iconv -f latin-1" fails on such a system.
> Using the same fallback_encoding() mechanism factored out in the
> previous step, teach ourselves that "ISO-8859-1" has a better chance
> of being accepted than "latin-1".
I was curious if this was the most official or accepted spelling.
Grepping a few hundred thousand messages from my mail archives, it does
seem to be the most common.
> diff --git a/utf8.c b/utf8.c
> index 550e785..0c8e011 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -501,6 +501,13 @@ static const char *fallback_encoding(const char *name)
> if (is_encoding_utf8(name))
> return "UTF-8";
> + /*
> + * Even though latin-1 is still seen in e-mail
> + * headers, some platforms only install ISO-8859-1.
> + */
> + if (!strcasecmp(name, "latin-1"))
> + return "ISO-8859-1";
For the UTF-8 fallbacks, we actually detect their equivalence via
same_encoding() before even hitting iconv. Is it worth doing the same
I have to admit that I don't care too deeply about performance for
somebody who wants to convert "latin1" to "ISO-8859-1". If one of your
encodings is not UTF-8, you are probably Doing It Wrong. :)