On Fri, 7 Dec 2012 08:48:41 -0800 (PST)
John McKown <john.archie.mck...@gmail.com> wrote:
> That's the question. Or I really am not understanding what all is in
> there. But what I understand (if correct) is that the files
> in .git/objects are such that the subdirectory name is the first two
> hex characters of the SHA1SUM value of the contents, the file name
> within that subdirectory are the last 38 characters. And the contents
> of the file are a compressed version of the data.
That's just an implementation detail which does not really affect the
way Git works from the standpoint of its user.
Also note that when the number of objects and commits grow, Git starts
to organise them into the so-called "pack files", so the number of
those "loose" objects is not usually high. The same applies to
references (branches and tags) which might get "archived" as well, when
Git thinks they should be. As with packed object, this is transparent
to the users and higher-level Git tools.
> In any case, once
> the file is created, the contents are never updated. In fact, if they
> were updated, git would likely be royally messed up. If the original
> file is modified, a new SHA1SUM file is generated. So, why doesn't
> git just mark the files as "read only"?
First, there's the question of filesystem semantics.
That "read only" attribute found on FATs (or maybe earlier filesystems)
actually compensated for complete lack of any access controls (and
user identities in the OS, for that matter) which are typically used
novadays to restrict access to files. These attributes were carried to
NTFS to provide backward compatibility.
Outside of FAT and NTFS, I know that extN filesystems implemented by the
Linux kernel support a set of attributes as well, but this set is
different from Microsoft filesystems. extN supports the so-called
"immutable" attribute which is sort-of equivalent to "read only".
Note that the standard C library which defines those lowest-level
file-manipulation functions like open(2), do not mention file
What I try to convey, is that there's no such thing as a universally
available and agreed-upon convention about supporting "read only"
attributes on files; hence to implement it, Git would have to maintain
a list of "known" filesystems and be able to use filesystem-specific
methods to manipulate attributes on those files.
But now consider that if Git (running under the same credentials as
you, the logged in user) is able to set that "read only" attribute on a
file, you still have full privileges to unset it, hence this kind of
"protecting" would be just a single level of fool-proofing and nothing
more. Say, imagine a standard Windows Explorer dialog for deleting
files -- if it would encounter a file marked as read only, it would ask
you if you really sure to delete that file, and I don't think I have to
assure you that most users just click through all those nagging dialogs
which get in thir way. So really, implementing this sort of protection
is too much trouble for too little reason.
The primary way to protect your Git repository is to have backups.
Naturally, with Git (as with any DVCS), this could just amount to
periodically mirroring your changes to some offsite repository.
Naturally, if several developers are working on a project, each having
their own private clone on their own hardware, the project becomes
really hard to kill completely as if someone thrashes their repository,
they could just re-clone it from someone else's copy.
If you're worried about what happens if someone/something
deletes/modifies a Git object and this goes unnoticed, then it's not
that bad either:
1) When doing certain operations, Git does re-check hashes of objects
2) Git has the `git fsck` program, which is able to perform thorough
analysis of a Git repository, so if you have any doubts, you could
just run it on a suspicious repository.