On Fri, Jun 27, 2014 at 10:48 AM, Junio C Hamano <gits...@pobox.com> wrote:
> Even though the original question mentioned "delta discovery", I
> think what was being asked is not "delta" in the Git sense (which
> your answer is about) but is "can we diff two long sequences of text
> (that happens to consist of only 4-letter alphabet but that is a
> irrelevant detail) without holding both in-core in their entirety?",
> which is a more relevant question/desire from the application point
> of view.
.. even there, there's another issue. With enough memory, the diff
itself should be fairly reasonable to do, but we do not have any sane
*format* for diffing those kinds of things.
The regular textual diff is line-based, and is not amenable to
comparing two long lines. You'll just get a diff that says "the two
really long lines are different".
The binary diff option should work, but it is a horrible output
format, and not very helpful. It contains all the relevant data ("copy
this chunk from here to here"), but it's then shown in a binary
encoding that isn't really all that useful if you want to say "what
are the differences between these two chromosomes".
I think it might be possible to just specify a special diff algorithm
(git already supports that, obviously), and just introduce a new "use
binary diffs with a textual representation" model.
But it also sounds like there might be some actual performance problem
with these 1GB file delta-calculations. Which I wouldn't be surprised
about at all.
Jarrad - is there any public data you could give as an example and for
people to play with?
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html