Re: [fossil-users] fossil commit is extremely slow
On Sat, Jul 27, 2013 at 3:16 PM, Eric Rubin-Smith eas@gmail.com wrote: I have a largish repo I ingested from CVS (via git, as I previously described on this list). I'm using fossil 1.26. A tiny commit to a single file takes 63 seconds: [monk:code] $ time fossil commit -m Test check-in New_Version: c46175729e936137f58ef302308d1e95b62e6a61 real1m2.767s user0m15.090s sys 0m7.227s I.e. ~22 seconds of CPU usage, and presumably the rest is on the disk. The box is pretty old (see below for /proc/cpuinfo), and I know that fossil is not written to be a speed demon -- but this still seems pretty ridiculous. That is ridiculous. Most commits take less than a second, even on archaic machines, such as my 15-year-old PPC iBook clocked at 400MHz. How many files are in your check-out? What's the total size of all those files (how big is the checkout)? Is the repository or the check-out on a network filesystem? -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] fossil commit is extremely slow
On Sat, Jul 27, 2013 at 3:23 PM, Richard Hipp d...@sqlite.org wrote: That is ridiculous. Most commits take less than a second, even on archaic machines, such as my 15-year-old PPC iBook clocked at 400MHz. How many files are in your check-out? [monk:repo.fossil] $ find .|wc -l 8095 What's the total size of all those files (how big is the checkout)? [monk:repo.fossil] $ du -sch . 392M. 392Mtotal Is the repository or the check-out on a network filesystem? No and no. Eric ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] fossil commit is extremely slow
If Windows, add fossil.exe to the excluded process list of your antivirus app. On Sat, Jul 27, 2013 at 3:41 PM, Eric Rubin-Smith eas@gmail.com wrote: On Sat, Jul 27, 2013 at 3:23 PM, Richard Hipp d...@sqlite.org wrote: That is ridiculous. Most commits take less than a second, even on archaic machines, such as my 15-year-old PPC iBook clocked at 400MHz. How many files are in your check-out? [monk:repo.fossil] $ find .|wc -l 8095 What's the total size of all those files (how big is the checkout)? [monk:repo.fossil] $ du -sch . 392M. 392Mtotal Is the repository or the check-out on a network filesystem? No and no. Eric ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] fossil commit is extremely slow
On Sat, Jul 27, 2013 at 3:41 PM, Eric Rubin-Smith eas@gmail.com wrote: On Sat, Jul 27, 2013 at 3:23 PM, Richard Hipp d...@sqlite.org wrote: What's the total size of all those files (how big is the checkout)? [monk:repo.fossil] $ du -sch . 392M. 392Mtotal That would be the culprit. As one of several self-checks (see http://www.fossil-scm.org/fossil/doc/trunk/www/selfcheck.wiki), Fossil always computes an MD5 checksum over the entire check-out and compares that to the content being checked in, to make sure they are identical. With a 392MB checkout on an older machine, that might easily take a minute. The Fossil repositories for Fossil itself, and for SQLite are just 14MB and 22MB, respectively. And I do most of my work on a fast machine, so I never notice the extra commit-time needed for this self-check. I think you can turn off this safety-check using: fossil setting repo-cksum off Please try that, and let us know whether or not it solves your problem. -- D. Richard Hipp d...@sqlite.org ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] fossil commit is extremely slow
On Sat, Jul 27, 2013 at 4:15 PM, Richard Hipp d...@sqlite.org wrote: On Sat, Jul 27, 2013 at 3:41 PM, Eric Rubin-Smith eas@gmail.comwrote: On Sat, Jul 27, 2013 at 3:23 PM, Richard Hipp d...@sqlite.org wrote: What's the total size of all those files (how big is the checkout)? [monk:repo.fossil] $ du -sch . 392M. 392Mtotal That would be the culprit. As one of several self-checks (see http://www.fossil-scm.org/fossil/doc/trunk/www/selfcheck.wiki), Fossil always computes an MD5 checksum over the entire check-out and compares that to the content being checked in, to make sure they are identical. With a 392MB checkout on an older machine, that might easily take a minute. I tested this basic claim and do not believe it holds: [monk:~] $ head -c $(echo 392*1024*1024|bc) /dev/zero foo [monk:~] $ du -sch foo 392Mfoo 392Mtotal [monk:~] $ time md5sum foo c6d8f8fc5c75fd6ecceb4edf42f3ac4d foo real0m1.324s user0m0.998s sys 0m0.247s So just over a second to calculate that hash on the same box. I retried this after dropping kernel caches to test whether it's the disk, and it still only took 3.6 seconds to calculate the hash. Of course, that's just the time it takes to calculate the hash. Obviously it does not include the time spent concatenating the world together to send to your MD5 function. Perhaps there's a super-linear algorithm in that concatenation stuff? Turning off repo-cksum* **did* address the issue, at least by an order of magnitude: [monk:code] $ fossil setting repo-cksum off [monk:code] $ time fossil commit -m test commit New_Version: 4d3b92dca8a617d6004bbe4e9c158fc11882720d real0m7.365s user0m0.627s sys 0m0.398s Does this leave any serious gaps in fault-tolerance? The new performance is acceptable, though I'm still happy to keep digging around if you're still curious (either about what was taking so long, or about what is still taking 7 seconds, or both). Thanks Richard. Eric ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] fossil commit is extremely slow
On Sat, Jul 27, 2013 at 10:31 PM, Eric Rubin-Smith eas@gmail.comwrote: [monk:code] $ fossil setting repo-cksum off FYI: if you want that setting used globally by default for your repos, add the -global flag. Otherwise it will apply on to that repo. -- - stephan beal http://wanderinghorse.net/home/stephan/ http://gplus.to/sgbeal ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] fossil commit is extremely slow
Thus said Eric Rubin-Smith on Sat, 27 Jul 2013 16:31:46 -0400: I tested this basic claim and do not believe it holds: [monk:~] $ head -c $(echo 392*1024*1024|bc) /dev/zero foo [monk:~] $ du -sch foo 392Mfoo 392Mtotal [monk:~] $ time md5sum foo c6d8f8fc5c75fd6ecceb4edf42f3ac4d foo real0m1.324s user0m0.998s sys 0m0.247s I believe this test is slightly flawed. You have 8095 files and directories for a total of 392M. This is not at all the same as 1 file that totals 392M. So your test doesn't account for the distribution of the data on the disk and the file system slowness that could result therefrom. A better comparison would be: time find . -type f -exec md5sum {} \; Andy -- TAI64 timestamp: 400051f43494 ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users
Re: [fossil-users] fossil commit is extremely slow
On Sat, Jul 27, 2013 at 4:58 PM, Andy Bradford amb-sendok-1377550706.oeilkncbciakkppah...@bradfords.org wrote: Thus said Eric Rubin-Smith on Sat, 27 Jul 2013 16:31:46 -0400: I tested this basic claim and do not believe it holds: [monk:~] $ head -c $(echo 392*1024*1024|bc) /dev/zero foo [monk:~] $ du -sch foo 392Mfoo 392Mtotal [monk:~] $ time md5sum foo c6d8f8fc5c75fd6ecceb4edf42f3ac4d foo real0m1.324s user0m0.998s sys 0m0.247s I believe this test is slightly flawed. You have 8095 files and directories for a total of 392M. This is not at all the same as 1 file that totals 392M. So your test doesn't account for the distribution of the data on the disk and the file system slowness that could result therefrom. Good point! Not to mention duplicated syscall overhead etc. I ran a riff on your idea and got a very different result: [monk:repo.fossil] $ time find . -type f -exec cat {} \; | md5sum - 3abe8f411181a328c7b64946ff6a9c7a - real0m37.631s user0m2.973s sys 0m11.543s As you predicted, most of that time is spent on disk I/O, not e.g. in forking 'cat'. So that explains over half of the run-time for my fossil command. For the other half, I ran fossil under callgrind and found that at least 44% of its instruction reads were inside zlib, and at least 34% were spent updating the MD5 sum: Ir 41,797,779,918 PROGRAM TOTALS Ir file:function 18,101,410,264 /usr/src/debug/zlib-1.2.5/inflate.c:inflate (55531x) [/lib64/libz.so.1.2.5] 18,101,410,264 * /usr/src/debug/zlib-1.2.5/inffast.c:inflate_fast [/lib64/libz.so.1.2.5] 13,824,797,833 /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Update (24296657x) [/usr/local/bin/fossil-1.26-eas-built] 3,983 /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Final (7x) [/usr/local/bin/fossil-1.26-eas-built] 13,824,801,816 * /home/eas/Fossil-c9cb6e72932fefbe/./src/md5.c:MD5Transform [/usr/local/bin/fossil-1.26-eas-built] (and those are just the top two functions). All that uncompressing seems to come from blob_uncompress. So I guess the only remaining question is whether all those blob uncompresses are really necessary. I assume yes -- and in any case I have my answers. :-) Thanks again. Eric ___ fossil-users mailing list fossil-users@lists.fossil-scm.org http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users