On Sat, Jul 27, 2013 at 4:58 PM, Andy Bradford < amb-sendok-1377550706.oeilkncbciakkppah...@bradfords.org> wrote:
> Thus said Eric Rubin-Smith on Sat, 27 Jul 2013 16:31:46 -0400: > > > I tested this basic claim and do not believe it holds: > > > > [monk:~] $ head -c $(echo 392*1024*1024|bc) /dev/zero > foo > > [monk:~] $ du -sch foo > > 392M foo > > 392M total > > [monk:~] $ time md5sum foo > > c6d8f8fc5c75fd6ecceb4edf42f3ac4d foo > > > > real 0m1.324s > > user 0m0.998s > > sys 0m0.247s > > I believe this test is slightly flawed. You have 8095 files and > directories for a total of 392M. This is not at all the same as 1 file > that totals 392M. So your test doesn't account for the distribution of > the data on the disk and the file system slowness that could result > therefrom. > Good point! Not to mention duplicated syscall overhead etc. I ran a riff on your idea and got a very different result: [monk:repo.fossil] $ time find . -type f -exec cat {} \; | md5sum - 3abe8f411181a328c7b64946ff6a9c7a - real 0m37.631s user 0m2.973s sys 0m11.543s As you predicted, most of that time is spent on disk I/O, not e.g. in forking 'cat'. So that explains over half of the run-time for my fossil command. For the other half, I ran fossil under callgrind and found that at least 44% of its instruction reads were inside zlib, and at least 34% were spent updating the MD5 sum: -------------------------------------------------------------------------------- Ir -------------------------------------------------------------------------------- 41,797,779,918 PROGRAM TOTALS -------------------------------------------------------------------------------- Ir file:function -------------------------------------------------------------------------------- 18,101,410,264 < /usr/src/debug/zlib-1.2.5/inflate.c:inflate (55531x) [/lib64/libz.so.1.2.5] 18,101,410,264 * /usr/src/debug/zlib-1.2.5/inffast.c:inflate_fast [/lib64/libz.so.1.2.5] 13,824,797,833 < /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Update (24296657x) [/usr/local/bin/fossil-1.26-eas-built] 3,983 < /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Final (7x) [/usr/local/bin/fossil-1.26-eas-built] 13,824,801,816 * /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Transform [/usr/local/bin/fossil-1.26-eas-built] (and those are just the top two functions). All that uncompressing seems to come from blob_uncompress. So I guess the only remaining question is whether all those blob uncompresses are really necessary. I assume yes -- and in any case I have my answers. :-) Thanks again. Eric
_______________________________________________ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users