Jeff King <[email protected]> writes:
> 1. I suppose we could also use $LANG or one of the $LC_* variables to
> guess at the encoding of the user's pattern. But I think using the
> output encoding makes the most sense, since then the pattern you
> searched for will actually be in the output.
I agree. In addition, if we were to do anything with LANG/LC_CTYPE,
it should be done at the layer that implements log-output-encoding
(e.g. lack of configured encoding with nonstandard LANG/LC_CTYPE
would use the locale, or something), I think.
> 2. There are still problems with utf8 normalization. E.g., my tests
> represent utf-8 é with \xc3\xa9 (the code point for that glyph),
> but it could also be represented by \x65\xcc\x81 (e + combining
> acute). But that is not a new problem; it is an inherent issue with
> grepping utf8. We might in the future want to offer an option to
> normalize utf8 (or possibly the regex library can be taught to
> handle this).
True; in either case, this caller (or any other callers) should
care. Only grep_buffer() (actually, grep_source_1()) needs to be
taught about it.
> 4. I'm still not clear on why "--graph --no-walk" wants to look at
> commit_match after we have already cleared the commit buffer. I
> agree it's nonsensical, but I wonder if it might be a symptom of an
> underlying bug or inefficiency.
Yeah, that may be something we may want to check, I agree.
The aded test is also nice. Thanks.
> diff --git a/t/t4210-log-i18n.sh b/t/t4210-log-i18n.sh
> new file mode 100755
> index 0000000..52a7472
> --- /dev/null
> +++ b/t/t4210-log-i18n.sh
> @@ -0,0 +1,58 @@
> +#!/bin/sh
> +
> +test_description='test log with i18n features'
> +. ./test-lib.sh
> +
> +# two forms of é
> +utf8_e=$(printf '\303\251')
> +latin1_e=$(printf '\351')
> +
> +test_expect_success 'create commits in different encodings' '
> + test_tick &&
> + cat >msg <<-EOF &&
> + utf8
> +
> + t${utf8_e}st
> + EOF
> + git add msg &&
> + git -c i18n.commitencoding=utf8 commit -F msg &&
> + cat >msg <<-EOF &&
> + latin1
> +
> + t${latin1_e}st
> + EOF
> + git add msg &&
> + git -c i18n.commitencoding=ISO-8859-1 commit -F msg
> +'
> +
> +test_expect_success 'log --grep searches in log output encoding (utf8)' '
> + cat >expect <<-\EOF &&
> + latin1
> + utf8
> + EOF
> + git log --encoding=utf8 --format=%s --grep=$utf8_e >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'log --grep searches in log output encoding (latin1)' '
> + cat >expect <<-\EOF &&
> + latin1
> + utf8
> + EOF
> + git log --encoding=ISO-8859-1 --format=%s --grep=$latin1_e >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'log --grep does not find non-reencoded values (utf8)' '
> + >expect &&
> + git log --encoding=utf8 --format=%s --grep=$latin1_e >actual &&
> + test_cmp expect actual
> +'
> +
> +test_expect_success 'log --grep does not find non-reencoded values (latin1)'
> '
> + >expect &&
> + git log --encoding=ISO-8859-1 --format=%s --grep=$utf8_e >actual &&
> + test_cmp expect actual
> +'
> +
> +test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html