* Ingo Molnar <[EMAIL PROTECTED]> wrote:
> i'd be surprised if it was twice as fast - cache-cold linear checkouts
> are _seek_ limited, and it doesnt matter whether after a 1-2 msec
> track-to-track disk seek the DMA engine spends another 30 microseconds
> DMA-ing 60K uncompressed data instead of 30K compressed... (there are
> other factors, but this is the main thing.)
i've benchmarked cache-cold compressed vs. uncompressed performance, to
shed some more light on the performance differences between flat and
i did alot of testing, and i primarily concentrated on being able to
_trust_ the benchmark results, not to generate some quick numbers. The
major problem was that the timing of the reads associated with 'checking
out a large tree' is very unstable, even on a completely isolated
testsystem with very common (and predictable) IO hardware.
the content i tested was a vanilla 2.6.10 kernel tree, with 19042 files
in it, taking 246 MB uncompressed, and 110 MB compressed (via gzip -9).
Average file size is 13.2 KB uncompressed, 5.9 KB compressed.
Firstly, the timings are very sensitive to the way the tree was created.
To have a 'fair' on-disk layout the trees have to be created in an
identical fashion: e.g. it is not valid to copy the uncompressed tree
and run gzip over it - that will create a 'sparse' on-disk layout
penalizing the compressed layout and making it 30% slower than the
uncompressed layout! I first created the two trees, then i "cp -a"-ed
them over into a new directory one after each other, so that they get on
similar on-disk positions as well. I also created 2 more pairs of such
trees to make sure disk layout is fair.
all timings were taken fresh after reboot, on a UP 1 GB RAM Athlon64
3200+, using a large, top of the line IDE disk. The kernel was
2.6.12-rc2, the filesystem was ext3 with enough free space to not be
fragmented, both noatime and nodiratime was specified so that no write
activities whatever occur during the 'checkout'.
the operation timed was a simple:
time find . -type f | xargs cat > /dev/null
done in the root of the given tree. This generates the very same
readonly IO pattern for each test. I've run the tests 10 times (i.e.
have done 10 fresh reboots), but after every reboot i permutated the
order of trees tested - to make sure there is no interaction between
trees. (there was no interaction)
here are the raw numbers, elapsed real time in seconds:
flat-1: 29.7 29.5 29.4 29.4 29.5 29.5 29.7 29.6 29.4 29.6 29.5 29.4: 29.5
gzip-1: 41.2 40.9 40.7 40.7 40.5 41.7 41.0 40.3 40.6 40.8 40.8 40.9: 40.8
flat-2: 28.0 28.2 27.7 27.9 27.8 27.9 27.7 27.9 27.9 28.1 27.9 28.0: 27.9
gzip-2: 27.2 27.4 27.4 27.2 27.2 27.2 27.2 27.2 27.1 27.3 27.2 27.4: 27.2
flat-3: 27.0 27.8 27.6 27.7 27.8 27.8 27.8 27.7 27.8 27.6 27.8 27.8: 27.6
gzip-3: 25.8 26.8 26.6 26.5 26.5 26.5 26.6 26.4 26.5 26.7 26.6 26.7: 26.5
The final column is the average. (Standard deviation is below 0.1 sec,
less than 0.3%.)
flat-1 is the original tree, created via tar. gzip-1 is a cp -a copy of
it, per-file compressed afterwards. flat-2 is a cp -a copy of flat-1,
gzip-2 is a cp -a copy of gzip-1. flat-3/gzip-3 are cp -a copies of
note that gzip-1 is ~40% slower due to the 'sparse layout', so its
results approximate a repository with 'bad' file layout. I'd not expect
GIT repositories to have such a layout normally, so we can disregard it.
flat-2/3 and gzip-2/3 can be directly compared. Firstly, the results
show that the on-disk layout cannot be constructed reliably - there's a
1% systematic difference between flat-2 and flat-3, and a 3% systematic
difference between gzip-2 and gzip-3 - both systematic errors are larger
than the 0.5% standard deviation, so they are not measurement errors but
real layout properties of these trees.
the most interesting result is that gzip-2 is 2.5% faster than flat-2,
and gzip-3 is 4% faster than flat-3. These differences are close to the
layout-related systematic error, but slightly above it, so i'd conclude
that a compressed repository is 3% faster on this hardware.
(since these results were in line with my expectations i double-checked
everything again and did another 10 reboot tests - same results.)
conclusion [*]: there's a negligible cache-cold performance hit from
using an uncompressed repository, because cache-cold performance is
dominated by number of seeks, which is almost identical in the two
[*] lots of conditionals apply: these werent flat/compressed GIT
repositories (although they were quite similar to it), nor was the GIT
workload measured (although the one measured should be quite close to
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html