On Sat, Jul 27, 2013 at 4:58 PM, Andy Bradford <
amb-sendok-1377550706.oeilkncbciakkppah...@bradfords.org> wrote:

> Thus said Eric Rubin-Smith on Sat, 27 Jul 2013 16:31:46 -0400:
>
> > I tested this basic claim and do not believe it holds:
> >
> > [monk:~] $ head -c $(echo 392*1024*1024|bc) /dev/zero > foo
> > [monk:~] $ du -sch foo
> > 392M    foo
> > 392M    total
> > [monk:~] $ time md5sum foo
> > c6d8f8fc5c75fd6ecceb4edf42f3ac4d  foo
> >
> > real    0m1.324s
> > user    0m0.998s
> > sys     0m0.247s
>
> I  believe  this test  is  slightly  flawed.  You  have 8095  files  and
> directories for a total  of 392M. This is not at all the  same as 1 file
> that totals 392M.  So your test doesn't account for  the distribution of
> the data  on the  disk and  the file system  slowness that  could result
> therefrom.
>

Good point!  Not to mention duplicated syscall overhead etc.  I ran a riff
on your idea and got a very different result:

[monk:repo.fossil] $ time find . -type f -exec cat {} \; | md5sum -
3abe8f411181a328c7b64946ff6a9c7a  -

real    0m37.631s
user    0m2.973s
sys     0m11.543s

As you predicted, most of that time is spent on disk I/O, not e.g. in
forking 'cat'.  So that explains over half of the run-time for my fossil
command.

For the other half, I ran fossil under callgrind and found that at least
44% of its instruction reads were inside zlib, and at least 34% were spent
updating the MD5 sum:

--------------------------------------------------------------------------------
            Ir
--------------------------------------------------------------------------------
41,797,779,918  PROGRAM TOTALS

--------------------------------------------------------------------------------
            Ir  file:function
--------------------------------------------------------------------------------

18,101,410,264  < /usr/src/debug/zlib-1.2.5/inflate.c:inflate (55531x)
[/lib64/libz.so.1.2.5]
18,101,410,264  *  /usr/src/debug/zlib-1.2.5/inffast.c:inflate_fast
[/lib64/libz.so.1.2.5]

13,824,797,833  < /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Update
(24296657x) [/usr/local/bin/fossil-1.26-eas-built]
         3,983  < /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Final
(7x) [/usr/local/bin/fossil-1.26-eas-built]
13,824,801,816  *
/home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Transform
[/usr/local/bin/fossil-1.26-eas-built]

(and those are just the top two functions).

All that uncompressing seems to come from blob_uncompress.  So I guess the
only remaining question is whether all those blob uncompresses are really
necessary.  I assume yes -- and in any case I have my answers. :-)

Thanks again.

Eric
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to