On 25.02.13 16:19, Jeff King wrote:
> On Mon, Feb 25, 2013 at 09:37:50AM +0100, Johannes Sixt wrote:
>> From: Johannes Sixt <j...@kdbg.org>
>> iconv on Windows does not know the encoding name "utf8", and does not
>> re-encode log messages when this name is given. Request "UTF-8" encoding.
>> Signed-off-by: Johannes Sixt <j...@kdbg.org>
>> I'm not sure whether I'm right to say that "UTF-8" is the correct
>> spelling. Anyway, 'iconv -l' on my old Linux box lists "UTF8", but on
>> Windows it does not.
> UTF-8 is correct according to:
>> A more correct fix would probably be to use is_encoding_utf8() in more
>> places, but it's outside my time budget look after it.
> Yeah, I wonder if this is a symptom of a deeper issue, which is that
> utf-8 has many synonyms, and we would prefer to canonicalize the
> encoding name before generating an object to avoid inconsistencies (of
> course we cannot do so for every imaginable encoding, but utf-8 is a
> pretty obvious one we handle already). We _should_ be generating commits
> with no encoding header at all for utf-8, though.
> And indeed, it looks like that is the case. commit_tree_extended has:
> /* Not having i18n.commitencoding is the same as having utf-8 */
> encoding_is_utf8 = is_encoding_utf8(git_commit_encoding);
> if (!encoding_is_utf8)
> strbuf_addf(&buffer, "encoding %s\n", git_commit_encoding);
> which makes me think that this first hunk...
>> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
>> index 52a7472..b1956e2 100755
>> --- a/t/t4210-log-i18n.sh
>> +++ b/t/t4210-log-i18n.sh
>> @@ -15,7 +15,7 @@ test_expect_success 'create commits in different
>> encodings' '
>> git add msg &&
>> - git -c i18n.commitencoding=utf8 commit -F msg &&
>> + git -c i18n.commitencoding=UTF-8 commit -F msg &&
>> cat >msg <<-EOF &&
> ...should be a no-op; the utf8 there should never be seen by anybody but
> git. Can you confirm that is the case?
>> @@ -30,7 +30,7 @@ test_expect_success 'log --grep searches in log output
>> encoding (utf8)' '
>> - git log --encoding=utf8 --format=%s --grep=$utf8_e >actual &&
>> + git log --encoding=UTF-8 --format=%s --grep=$utf8_e >actual &&
>> test_cmp expect actual
> This one will feed it to iconv, though, because the latin1 commit will
> need to be re-encoded. I think the simplest thing would just be:
> diff --git a/utf8.c b/utf8.c
> index 1087870..8d42b50 100644
> --- a/utf8.c
> +++ b/utf8.c
> @@ -507,6 +507,17 @@ char *reencode_string(const char *in, const char
> *out_encoding, const char *in_e
> if (!in_encoding)
> return NULL;
> + /*
> + * Some platforms do not have the variously spelled variants of
> + * UTF-8, so let us feed iconv the most official spelling, which
> + * should hopefully be accepted everywhere.
> + */
> + if (is_encoding_utf8(in_encoding))
> + in_encoding = "UTF-8";
> + if (is_encoding_utf8(out_encoding))
> + out_encoding = "UTF-8";
> conv = iconv_open(out_encoding, in_encoding);
> if (conv == (iconv_t) -1)
> return NULL;
> Does that fix the tests for you? It's a larger change, but I think it
> makes git friendlier all around for people on Windows.
Thanks, I'm OK with your version.
And a test on cygwin was OK for the new t4210.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html