Steven Michalske <smichal...@gmail.com> writes:

> Would having arbitrary key value pairs be useful in the git data
> model?

My answer to the question is that it is harmful to the data model,
but the benefit of going against the data model _may_ outweigh the
downside.  It is all relative.

The first of very small number of principles of the git data model
is that the object name is derived solely from the contents, hence
we can tell two different things apart with object names without
looking at object contents.

This is actively broken by adding "junk" fields left and right.
Adding arbitrary pieces of data that are optional (and largely
ignored by core operations) means you can record objects with
essentially the same contents under different object names, so
object names no longer help us telling two moral-equivalent objects
apart.

But "if two objects have different names, they are not the same"
does not have to be the only and the absolute truth in all contexts;
the world is not so black and white.  Depending on the application
and the context, you may want to treat two things that are not the
same as equivalents.

For example, at the blob level, two blob objects that store the same
text (say, one original and the other typed in double-space) would
be different objects and have different object names, but you may
want to treat them as "equivalents" (not same but interchangeable),
by applying textconv filter to normalize their contents when
comparing them.  We still keep the "two objects with different names
are different" principle, but at the same time, allow users to treat
them as equivalent in specific contexts.

Introducing a hack to exclude selective "junk" fields from hashing
done for object name computation is not a solution and is out of the
question, but that does not necessarily mean that commit objects
should never be extended with new types of header fields.  When a
commit object is made with a "junk" field, it will have a name that
is different from the one it would get without the "junk" field, but
the benefit of the ability to store extra data _may_ outweigh the
downside of having to always compare the contents of two objects
with different names to find out that they are different but
equivalent.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to