Sorry, I'm not great at deciphering linux diagnostics (I'm relatively
new to it--a year or two), but I did a little poking around to see what
might be causing trouble. Wikipedia had these choice bits to say about
the C3 chip design:
C3
* Because memory performance is the limiting factor in many
benchmarks, VIA processors implement large primary caches, large
TLBs <http://en.wikipedia.org/wiki/Translation_Lookaside_Buffer>,
and aggressive prefetching
<http://en.wikipedia.org/wiki/Prefetching>, among other
enhancements. While these features are not unique to VIA, memory
access optimization is one area where they have not dropped
features to save die space. In fact generous primary caches (128K)
have always been a distinctive hallmark of Centaur / VIA designs.
* Clock frequency is in general terms favored over increasing
instructions per cycle. Complex features such as out-of-order
instruction execution are deliberately not implemented, because
they impact the ability to increase the clock rate, require a lot
of extra die space and power, and have little impact on
performance in several common application scenarios. Internally,
the C7 has 16 pipeline stages.
* The pipeline is arranged to provide one-clock execution of the
heavily used register--memory and memory--register forms of x86
instructions. Several frequently used instructions require fewer
pipeline clocks than on other x86 processors.
* Infrequently used x86 instructions are implemented in microcode
<http://en.wikipedia.org/wiki/Microcode> and emulated. This saves
die space and reduces power consumption. The impact upon the
majority of real world application scenarios is minimized.
* These design guidelines are derivative from the original RISC
<http://en.wikipedia.org/wiki/RISC> advocates, who stated a
smaller set of instructions, better optimized, would deliver
faster overall CPU performance.
And they give stats on L1/L2 cache sizes that are pertinent:
Processor Secondary
Cache (k) Die size
130 nm (mm²) Die size
90 nm (mm²)
C3 / C7 64/128 52 30
Athlon XP 256 84 N/A
Athlon 64 512 144 84
Pentium M 2048 N/A 84
P4 Northwood 512 146 N/A
P4 Prescott 1024 N/A 110
What I would take from this is A) the C3 does not have out of order
instruction scheduling, so a lot of places where a Pentium class chip
would fly through a hunk of code that has numerous data dependencies
will stall like crazy on a C3, causing tons of wasted cycles which show
up as CPU usage (the pipe is 16 instructions long on the C3, so a stall
is at least that many cycles). Calculating hashes is a pretty tight
loop, so that will probably increase the total clocks required to
perform a hash computation. B) The C3 has a pretty small L2 chip cache,
but a large L1. It may be adequate for this task, it may not... hard to
say without getting performance counters straight from the chip while
running a backup, to see how many cache misses you have. Chances are,
running the OS, Perl, plus the large data sets that are flooding through
the chip are demanding a lot from such a small cache. It may be that
the data itself is always in the cache, but the code for other tasks are
being swapped out so task switches are very expensive, or possibly the
data is so large or iterated in just the wrong way so that ~150k is too
small a working space to compute hashes. A cache miss on every 64 bytes
would show up as an incredible CPU hit.
You can test situation B) by going into the CMOS and disabling the L2
cache, or both if you have to, and re-run a 'quick' backup to compare
the times. If the cache is being blown constantly, this will have
little effect. Otherwise, it should run about 5x slower with the cache
disabled, indicating the L2 cache is not the bottleneck.
I will say that your server has a ton of files more than mine do, so
perhaps you're also being hit by a per-file overhead... maybe packet
processing costs are eating your lunch? It's possible your network
driver is doing a lot in software that counts toward your CPU usage? If
your server is also experiencing high load, I would suspect it to be a
per-file overhead of the transfer protocol rather than a specific
hardware problem with the C3, since it would also be reflected on a much
beefier box.
Hope this helps,
JH
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/