On Sat, Jun 28, 2014 at 1:56 AM, Thomas Braun
<[email protected]> wrote:
>> The name is misleading and forced me to read it twice before I
>> realized that this is "populating the is-binary bit". It might make
>> it a bit better if you renamed it to DIFF_POPULATE_IS_BINARY_BIT or
>> perhaps DIFF_POPULATE_CHECK_BINARY or something. For consistency,
>> the other bit may want to be also renamed from SIZE_ONLY to either
>>
>> (1) CHECK_SIZE_ONLY
>>
>> (2) One bit for CHECK_SIZE, another for NO_CONTENTS, and optionally
>> make SIZE_ONLY the union of two
>>
>> I do not have strong preference either way; the latter may be more
>> logical in that "not loading contents" and "check size" are sort of
>> orthogonal in that you can later choose to check, without loading
>> contents, only the binary-ness without checking size, but no calles
>> that passes a non-zero flag to the populate-filespec function will
>> want to slurp in the contents in practice, so in that sense we could
>> declare that the NO_CONENTS bit is implied.
Will do (and probably go with (1) as I still prefer zero as "good defaults")
>> But more importantly, would this patch actually help?
Well yes as demonstrated by the new test ;-) Unfortunately the scope
of help is limited to --stat.. I should have done more thorough
testing.
>> For one
>> thing, this wouldn't (and shouldn't) help if the user wants --binary
>> diff out of us anyway, I suspect, but I wonder what the following
>> codepath in the builtin_diff() function would do:
>>
>> ...
>> } else if (!DIFF_OPT_TST(o, TEXT) &&
>> ( (!textconv_one && diff_filespec_is_binary(one)) ||
>> (!textconv_two && diff_filespec_is_binary(two)) )) {
>> if (fill_mmfile(&mf1, one) < 0 || fill_mmfile(&mf2, two) < 0)
>> die("unable to read files to diff");
>> /* Quite common confusing case */
>> if (mf1.size == mf2.size &&
>> !memcmp(mf1.ptr, mf2.ptr, mf1.size)) {
>> if (must_show_header)
>> fprintf(o->file, "%s", header.buf);
>> goto free_ab_and_return;
>> }
>> fprintf(o->file, "%s", header.buf);
>> strbuf_reset(&header);
>> if (DIFF_OPT_TST(o, BINARY))
>> emit_binary_diff(o->file, &mf1, &mf2, line_prefix);
>> else
>> fprintf(o->file, "%sBinary files %s and %s differ\n",
>> line_prefix, lbl[0], lbl[1]);
>> o->found_changes = 1;
>> } else {
>> ...
>>
>> If we weren't told with --text/-a to force textual output, and
>> at least one of the sides is marked as binary (and this patch marks
>> a large blob as binary unless attributes says otherwise), we still
>> call fill_mmfile() on them to slurp the contents to be compared, no?
>>
>> And before you get to the DIFF_OPT_TST(o, BINARY), you memcmp(3) to
>> check if the sides are identical, so...
>
> Good point. So how about an additional change roughly sketched as
>
> @@ -2223,6 +2223,14 @@ struct userdiff_driver *get_textconv(struct
> diff_filespec *one)
> return userdiff_get_textconv(one->driver);
> }
>
> +/* read the files in small chunks into memory and compare them */
> +static int filecmp_chunked(struct diff_filespec *one,
> + struct diff_filespec *two)
> +{
> + // TODO add implementation
> + return 0;
> +}
> +
We have object streaming interface to do similar like this. In fact
index-pack already does large file memcmp() for hash collision test. I
think I can move some code around and support large file in this code
path..
--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html