Re: [git-users] Strange effect when tar-ing a cloned repository

Dale R. Worley Tue, 20 Aug 2013 06:50:10 -0700

> From: peter boudewijns <ing...@gmail.com>

> The entire difference could be pinned down in just 1 directory, 'sbin'.


> I do not know enough about the way Linux writes its files, and how it 
> determines the size of the files. But it seems to me the git-cloned files 
> contain empty space that occupies filesystem-space, but is not counted when 
> calculating the actual filesize .....

OK, the issue is "holes" in files.  Unix-like systems have a curious
feature:  A file is a sequence of bytes, numbered from 0 to
(length-1).  But an actual disk block for a section of the file is
only written when a program writes bytes in that place in the file.
If no block is allocated for a certain set of bytes (even though those
byte numbers are all less than the file length), those bytes are zero
by definition, and if a program tries to read them, it gets zeros.

Normally this never comes up because 99% of programs write files
starting at the beginning and running to the end.  But executable
binary files are the exception.  They have a complex structure which
can leave certain sections of the file existing but filled with zeros
by default because the file contents are written out-of-order.  This
can create a file that occupies less disk space than its length.

But ... if you copy such a file with a program that does not notice
the long sections of zeros and carefully avoids writing those blocks,
the new copy will have actual disk blocks allocated containing those
zeros, rather than having them be represented implicitly by the
absence of a disk block.

Here's one explanation:
http://en.wikipedia.org/wiki/Sparse_files#Sparse_files_in_Unix

Also, read the "du" and "cp" manual pages, looking for the words
"holes" and "sparse", to see situations where this matters.

Dale

-- 
You received this message because you are subscribed to the Google Groups "Git 
for human beings" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to git-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [git-users] Strange effect when tar-ing a cloned repository

Reply via email to