On Tue, 15 Mar 2005 00:48:10 -0800, Christopher Smith <[EMAIL PROTECTED]> wrote: > Carl Lowenstein wrote: > > >Apolgies for the somewhat testy tone above. > > > > > I'm going to pretend I didn't read it. ;-) > > >More information, gathered more at leisure. > > > > > >How long does it take just to read the CD? > >$ time dd if=/dev/scd0 of=/dev/null bs=2k > > 2m 20.5s > > > > > Is this the user + sys time, or the real time?
Real time. > Hmm... do you know if K3b is reading in 2k chunks? I'm wondering if it's > doing something stupid like reading a byte at a time and without any > buffering. Looking at < http://bugs.kde.org > at Bug 83832 "Verifying written data is very slow" I deduce that K3B is reading in 20k chunks. > >None of this explains why it takes 20 minutes to compute the MD5 sum > >of the hard-drive file within the K3b "write and check CD" procedure. > >Nor does it explain why it takes 20 minutes to read the CD image and > >compute its MD5 sum in the same procedure. > > > > > Okay. Now I'm really confused here. Are you saying that when K3b does a > read and check against the CD drive, it completes in less than 1/10th > the time it takes to do the same thing with the same data, but reading > from your hard drive? No. I am saying that each of them takes 20 minutes. > >There must be some weird software interference to make the whole > >process take something like 5x to 6x the sum of its parts. There > >should be little overlap between the CPU-bound MD5sum and the > >I/O-bound disk reading. > > > > > It almost seems like K3b is doing more than just reading and computing > md5sums when it's doing this work. > > >If this demonstrates the advantages of ObjectOriented Programming, I > >don't see it. > > > > > Sigh. One of the challenges with analysing performance problems is > avoiding drawing conclusions through prejudices. In your original post, > you referred to the MD5 algorithm employed by K3b as being "obfuscated" > and suggested that this was the source of the problem. In fact, the code > used to implement the algorithm is actually from RSA's own reference > library for MD5. With modern compilers (including gcc), its performance > is virtually indistinguishable from the C implementation. The code is a re-write of the RSA reference library implementation, done in C++ for the KDE standard library. Its performance more-or-less in isolation takes only 1.5x longer than the standard C implementaion. > Blaming OOP is a particularly bad choice here. Unlike the languages you > pointed to on that web page, C++'s runtime is a basically a superset of > C's, and C++ is structured in such a way that "you only pay the price > for the features you use". The C++ md5 implementation doesn't use any > features (OOP or otherwise) that impose a performance penalty over C, so > it is no suprise that it exhibits similar performance. Indeed, the Perl > and Python implementations on the page you pointed to are both non-OOP > in structure and in terms of features they take advantage of, and their > performance problems don't stem from anything about OOP. > > What is really missing here is a profile of K3b when all this is going > on. I suspect running something like oprofile on your system (after you > install debug info) while this is going on will prove to very > enlightening. If you're interested in fixing this problem, that should > be your first step. I'll be truly surprised if the bulk of it's time is > spent doing the md5 calculations. My second pass at making measurements and trying to isolate the parts of the problem shift most of the blame away from the MD5 computation. I now have to understand how K3b can read a CD image in 3:30, calculate a MD5sum in 1:35, and still take 20:00 to do the two jobs together. Worst case you should be able to do them separately, in 5 minutes. > I suspect the code is either doing > something else entirely that we're not aware of or is spending a lot of > time interfacing either with the I/O subsystem or the md5 library. If > it's the latter, I'd expect this problem could be cleared up fairly > easily by removing the bad bits of code. I may yet find myself driven to try to compile the whole K3b so that I can profile it. The prospect is not a happy one, because of the typical library bloat of programs like this. carl -- carl lowenstein marine physical lab u.c. san diego [EMAIL PROTECTED] -- [email protected] http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg
