Carl Lowenstein wrote:

Apolgies for the somewhat testy tone above.


I'm going to pretend I didn't read it. ;-)

More information, gathered more at leisure.


How long does it take just to read the CD?
$ time dd if=/dev/scd0 of=/dev/null bs=2k
2m 20.5s


Is this the user + sys time, or the real time?

Hmm... do you know if K3b is reading in 2k chunks? I'm wondering if it's doing something stupid like reading a byte at a time and without any buffering.

None of this explains why it takes 20 minutes to compute the MD5 sum
of the hard-drive file within the K3b "write and check CD" procedure.
Nor does it explain why it takes 20 minutes to read the CD image and
compute its MD5 sum in the same procedure.


Okay. Now I'm really confused here. Are you saying that when K3b does a read and check against the CD drive, it completes in less than 1/10th the time it takes to do the same thing with the same data, but reading from your hard drive?

There must be some weird software interference to make the whole
process take something like 5x to 6x the sum of its parts. There
should be little overlap between the CPU-bound MD5sum and the
I/O-bound disk reading.


It almost seems like K3b is doing more than just reading and computing md5sums when it's doing this work.

If this demonstrates the advantages of ObjectOriented Programming, I
don't see it.


Sigh. One of the challenges with analysing performance problems is avoiding drawing conclusions through prejudices. In your original post, you referred to the MD5 algorithm employed by K3b as being "obfuscated" and suggested that this was the source of the problem. In fact, the code used to implement the algorithm is actually from RSA's own reference library for MD5. With modern compilers (including gcc), its performance is virtually indistinguishable from the C implementation.

Blaming OOP is a particularly bad choice here. Unlike the languages you pointed to on that web page, C++'s runtime is a basically a superset of C's, and C++ is structured in such a way that "you only pay the price for the features you use". The C++ md5 implementation doesn't use any features (OOP or otherwise) that impose a performance penalty over C, so it is no suprise that it exhibits similar performance. Indeed, the Perl and Python implementations on the page you pointed to are both non-OOP in structure and in terms of features they take advantage of, and their performance problems don't stem from anything about OOP.

What is really missing here is a profile of K3b when all this is going on. I suspect running something like oprofile on your system (after you install debug info) while this is going on will prove to very enlightening. If you're interested in fixing this problem, that should be your first step. I'll be truly surprised if the bulk of it's time is spent doing the md5 calculations. I suspect the code is either doing something else entirely that we're not aware of or is spending a lot of time interfacing either with the I/O subsystem or the md5 library. If it's the latter, I'd expect this problem could be cleared up fairly easily by removing the bad bits of code.

--Chris
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg

Reply via email to