On Dec 18, 11:44 am, Ido Ran <ido....@gmail.com> wrote:

> The thing is that I have more than two entity types and each entity is
> identify by unique string (say GUID).
> If I will include the unique ID in the hash process it will destroy
> any chance of two content identical entities to have the same hash. As
> a result the database will blow-up.
> The interesting this is that if I completely remove the unique
> identifier and "serialize" object graph the way git serialize
> directory structure I'll be able to re-create the whole graph back in
> memory but I will not be able to identify the entities.

I may mistake, of course, but you seem to ignore the fact that blobs
are hashed
using their contents only, and not their names (which contribute to
the hashing
of tree objects); so you can solve this simply by naming your blobs
(that is, files)
using GUIDs related to them. This will make hashing of any two
identical blobs
having different "filesystem" names produce the same value and thus
they will
be not stored twice. Only tree objects will multiply, but supposedly
their growth
will contribute with a lower pace provided the size of your blobs
outweights their number.


You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To post to this group, send email to git-us...@googlegroups.com.
To unsubscribe from this group, send email to 
For more options, visit this group at 

Reply via email to