On Mon, Dec 11, 2017 at 10:50 AM,  <lars.schnei...@autodesk.com> wrote:
> From: Lars Schneider <larsxschnei...@gmail.com>
>
> Git and its tools (e.g. git diff) expect all text files in UTF-8
> encoding. Git will happily accept content in all other encodings, too,
> but it might not be able to process the text (e.g. viewing diffs or
> changing line endings).
>
> Add an attribute to tell Git what encoding the user has defined for a
> given file. If the content is added to the index, then Git converts the
> content to a canonical UTF-8 representation. On checkout Git will
> reverse the conversion.
>
> Reviewed-by: Patrick Lühne <patr...@luehne.de>
> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com>
> ---
> diff --git a/convert.c b/convert.c
> @@ -256,6 +257,149 @@ static int will_convert_lf_to_crlf(size_t len, struct 
> text_stat *stats,
> +static int encode_to_git(const char *path, const char *src, size_t src_len,
> +                        struct strbuf *buf, struct encoding *enc)
> +{
> +#ifndef NO_ICONV
> +       char *dst, *re_src;
> +       int dst_len, re_src_len;
> +
> +       /*
> +        * No encoding is specified or there is nothing to encode.
> +        * Tell the caller that the content was not modified.
> +        */
> +       if (!enc || (src && !src_len))
> +               return 0;
> +
> +       /*
> +        * Looks like we got called from "would_convert_to_git()".
> +        * This means Git wants to know if it would encode (= modify!)
> +        * the content. Let's answer with "yes", since an encoding was
> +        * specified.
> +        */
> +       if (!buf && !src)
> +               return 1;
> +
> +       if (enc->to_git == invalid_conversion) {
> +               enc->to_git = iconv_open(default_encoding, encoding->name);
> +               if (enc->to_git == invalid_conversion)
> +                       warning(_("unsupported encoding %s"), encoding->name);
> +       }
> +
> +       if (enc->to_worktree == invalid_conversion)
> +               enc->to_worktree = iconv_open(encoding->name, 
> default_encoding);

Do you need to be calling iconv_close() somewhere on the result of the
iconv_open() calls? [Answering myself after reading the rest of the
patch: You're caching these opened 'iconv' descriptors, so you don't
plan on closing them.]

> + [...]
> +       /*
> +        * Encode dst back to ensure no information is lost. This wastes
> +        * a few cycles as most conversions are round trip conversion
> +        * safe. However, content that has an invalid encoding might not
> +        * match its original byte sequence after the UTF-8 conversion
> +        * round trip. Let's play safe here and check the round trip
> +        * conversion.
> +        */
> +       re_src = reencode_string_iconv(dst, dst_len, enc->to_worktree, 
> &re_src_len);
> +       if (!re_src || strcmp(src, re_src)) {

You're using strcmp() as opposed to memcmp() because you expect
're_src' will unconditionally be UTF-8-encoded, right?

> +               die(_("encoding '%s' from %s to %s and back is not the same"),
> +                       path, enc->name, default_encoding);
> +       }
> +       free(re_src);
> +
> +       strbuf_attach(buf, dst, dst_len, dst_len + 1);
> +       return 1;
> +#else
> +       warning(_("cannot encode '%s' from %s to %s because "
> +               "your Git was not compiled with encoding support"),
> +               path, enc->name, default_encoding);
> +       return 0;
> +#endif
> +}

Reply via email to