On Tue, Sep 17, 2013 at 11:16:04PM +0300, Michael S. Tsirkin wrote:
> > Thinking about it some more, it's a best effort thing anyway,
> > correct?
> > So how about, instead of doing a hash over the whole input,
> > we hash each chunk and XOR them together?
> > This way it will be stable against chunk reordering, and
> > no need to keep patch in memory.
> > Hmm?
> That was a silly suggestion, two identical chunks aren't that unlikely :)
In a single patch, they should not be, as we should be taking into
account the filenames, no?
You could also do it hierarchically. Hash each chunk, store only the
hashes, then sort them and hash the result. That still has O(chunks)
storage, but it is only one hash per chunk, not the whole data.
A problem with both schemes, though, is that they are not
backwards-compatible with existing git-patch-id implementations. Whereas
sorting the data itself is (kind of, at least with respect to people who
are not using orderfile).
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html