On Mon, 14 Mar 2005 17:02:05 -0800, Carl Lowenstein
<[EMAIL PROTECTED]> wrote:
> On Sun, 13 Mar 2005 11:37:59 -0800, Christopher Smith <[EMAIL PROTECTED]>
> wrote:
> > Carl Lowenstein wrote:
> >
> > >I was using K3b on my Thinkpad laptop (300MHz PII). I burned an ISO
> > >image to a CDR disc, with the option "verify the MD5 checksum".
> > >Burning the image took 3.5 minutes. Calculating the checksum of the
> > >original 650MB image took some 20 minutes, and another 20 minutes to
> > >read the disc and calcuate its checksum. During all of this 40 minute
> > >interval the CPU was about 100% busy.
> > >
> > >Finding this hard to believe, I used the "md5sum" program from GNU
> > >core utils 5.2.1 and found that it took 37 seconds to calculate the
> > >checksum from the image.
> > >
> > >Some Google research has turned up < http://www.equi4.com/md5/ >
> > >"The MD5 algorithm in different programming languages". Timings vary
> > >by several orders of magnitude. Chasing through the K3b sources, I
> > >think I have found the algorithm as part of the kdecore Library, and
> > >it seems to be written in obfuscated C++, and documented in
> > >/usr/share/doc/HTML/en/kdelibs-apidocs\
> > >/kdecore/html/kmdcodec_8h-source.html
> > >
> > >Obviously, some computers are faster than others, and if I was doing
> > >this on a 2.8GHz P4 it would run about 10x faster. But this is
> > >ridiculous.
> > >
> > >
> > I'd be willing to bet your CPU was getting maxed because of issues
> > with your driver, rather than the md5 calculations. Depending on how
> > the drivers were setup, scanning the CD as fast as possible may
> > result in maxing out the CPU. Check out what happens when you
> > do an md5sum of /dev/cdrom (whatever that is linked to).
>
> You know, I have been using computer hardware for nigh on to 40 years
> now. And I have not recently seen such a misunderstanding of hardware
> vs. software speeds.
>
> The md5 calculation takes 20 minutes when done by K3b on the 650MB ISO
> image on the hard drive. It also takes 20 minutes when done directly
> by scanning the CD.
>
> Without using K3b:
> " time dd if=/dev/scd0 of=/dev/null bs=2k
> " 32546+0 records in
> " 32546+0 records out
> " real 2m20.542s
> " user 0m0.869s
> " sys 0m10.726s
>
> Observe that the hardware can scan the disk in 2 min 20 sec without
> significant CPU load. Also observer that this CD drive is achieving a
> performance of (74 min / 2.34 min) = 31.6 x.
>
> Try piping directly to md5sum:
>
> " time dd if=/dev/scd0 bs=2k | md5sum
> . . .
> " real 2m 53.991s
> " user 0m 24.075s
> " sys 0m24.108s
>
> Calculating md5sum on the fly whlie reading the CD adds some 33
> seconds to the total, and is observed to use 33% to 50% CPU time.
> Interestingly enough, calculating the md5sum from a disk image on the
> hard drive takes 37 seconds.
>
> My point is that whatever committee wrote K3b, they chose a very
> inefficient implementation of that algorithm. And it came from a
> standard k3b library.
Apolgies for the somewhat testy tone above.
More information, gathered more at leisure.
Also following some reading of Bugzilla apropriate to K3b.
# $Id: K3b_notes,v 1.1 2005/03/15 06:21:42 cdl Exp cdl $
Experiences using K3b on a slower computer (300MHz Thinkpad)
Misc. other hardware information: External hard drive and CDRW are on
USB2 connections through a Cardbus -> USB2 adapter.
Burning a 650MB .iso image takes 2.5 minutes
Checking MD5 sum takes 20 minutes to do .iso image, another 20 minutes
to read and check burned CD.
Try to separate this into its components. First of all, using
/usr/bin/md5sum (from coreutils 5.2.1)
$ time md5sum susepro92.cd1.iso
0m 45.2s
How long does it take just to read the .iso file?
$ time dd if=susepro92.cd1.iso of=/dev/null bs=2k
0m 36.9s
How long does it take just to read the CD?
$ time dd if=/dev/scd0 of=/dev/null bs=2k
2m 20.5s
Try MD5sum in a pipeline:
$ time dd if=/dev/scd0 bs=2k | md5sum
2m 54.6s
$ time md5sum /dev/scd0
2m 40.7s
$ time cmp susepro02.cd1.iso /dev/scd0
2m 26.6s
Note that "cmp" between file on hard drive and CD disk in reader takes
less time than md5sum of CD disk. But more time than md5sum of file.
Now use parts of K3b (timings by stopwatch)
Read ISO image from hard drive and compute MD5 sum
1m 35s
Note this is 3x the time to do the same computation using /usr/bin/md5sum
Copy ISO image from CD reader to file on hard drive
3m 30s
Note this is "only" 1.5x the time to do the same data transfer using dd.
It is also considerably more time than it takes to burn the CD using
the same hardware.
None of this explains why it takes 20 minutes to compute the MD5 sum
of the hard-drive file within the K3b "write and check CD" procedure.
Nor does it explain why it takes 20 minutes to read the CD image and
compute its MD5 sum in the same procedure.
There must be some weird software interference to make the whole
process take something like 5x to 6x the sum of its parts. There
should be little overlap between the CPU-bound MD5sum and the
I/O-bound disk reading.
If this demonstrates the advantages of ObjectOriented Programming, I
don't see it.
carl
--
carl lowenstein marine physical lab u.c. san diego
[EMAIL PROTECTED]
--
[email protected]
http://www.kernel-panic.org/cgi-bin/mailman/listinfo/kplug-lpsg