On 10/9/12 12:17 PM, John Whitney wrote:
Thank you very much for your detailed explanations. I suspected that efficiency concerns might be preventing a clean solution.

How about this idea... When git stores files, it could include a bit of metadata that tells it whether the file is a binary blob or text. (Perhaps it already does this?) If a binary blob (in the repository) is being compared with a text file (on the filesystem), git could re-process the blob and get the "sha1 of the canonical stripped version". In all other situations, the original SHA1 should be correct, since git already removes CRs from the line endings in files it recognizes as text.

I would think that this solution would have no performance penalty for "fixed" repositories. (It would only have a small performance hit when binary blobs are compared against text files, which is rare even in broken repositories.) Git could even throw a warning like: "File xyz.txt was originally stored as a binary blob."

What do you think?


I'm going to reply to myself, to save you the trouble of replying. (You've been very helpful and I do appreciate it.)

I guess the problem with this idea is that git doesn't have any way to distinguish between binary blobs and text files in the repository. I think it would be useful information, but I guess that bridge burned a long time ago. So any metadata would have to be stored separately. Jeff, that's roughly equivalent to your idea of caching, which would take a lot of work to implement.

Thank you so much for helping me to understand the reason git behaves the way it does. It's a great tool!

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to