Apparently, though unproven, at 23:00 on Wednesday 17 November 2010, Paul Hartman did opine thusly:
> On Wed, Nov 17, 2010 at 2:35 PM, Mick <michaelkintz...@gmail.com> wrote: > > Why is the second time so much faster? The size of the derived db was > > the same on both occasions. > > I guess caching like Volker said too. What happens if you do something > like this twice: > > sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb Now I'm intrigued. I did some quick and nasty tests. First, mlocate's updatedb. No measures taken to invalidate caches etc: # time updatedb real 0m39.265s user 0m2.245s sys 0m0.228s Then unmerge mlocate, emerge slocate, delete all dbs, run slocate's updatedb twice: # rm /var/lib/[ms]locate/*db # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 1m35.365s user 0m5.941s sys 0m0.383s # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 1m34.929s user 0m5.925s sys 0m0.377s slocate seems quicker than the few tests I'd already done with mlocate and has no optimizations to re-use existing correct data in the db. Now unmerge slocate, merge mlocate, do not delete dbs and run mlocate's updatedb twice: # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 3m50.574s user 0m7.277s sys 0m0.361s # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 1m5.830s user 0m2.088s sys 0m0.173s Second run definitely quicker as it only has to read the fs, not write the entire index as well. But that initial run ... The old slocate db was still around, possibly affecting the first run, so delete both db's and run mlocate's updatedb twice: # rm /var/lib/[ms]locate/*db # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 3m51.592s user 0m7.249s sys 0m0.350s # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 1m7.662s user 0m1.997s sys 0m0.159s Almost identical to the prior test, so the presence of slocate's db has no effect on mlocate. Then I realized I hadn't measured how long they took to reindex a largely cache'd fs so I tried that with both, deleting the db's at each test: slocate: # rm /var/lib/[ms]locate/*db rm: cannot remove `/var/lib/[ms]locate/*db': No such file or directory # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 1m34.341s user 0m5.929s sys 0m0.397s # time updatedb real 0m2.454s user 0m0.855s sys 0m1.569s mlocate: # rm /var/lib/[ms]locate/*db # sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb real 3m54.792s user 0m7.215s sys 0m0.350s # time updatedb real 0m0.538s user 0m0.302s sys 0m0.232s 0.5 second vs 2.5 seconds. Wow. Conclusions: 1. mlocate is slow at building it's db from scratch - about 250% as long as slocate on the same task. 2. mlocate is faster at reindexing a largely-unchanged fs - it does it in about 66% of the time slocate took. 3. mlocate is insanely quick at reindexing a db that is in cache. #1 is are - most systems will only do it once #3 is silly and does not represent anything close to reality #2 is pretty realistic and a 33% performance boost is significant I have no idea where the speed increase in #3 comes from. This is an ext4 fs - does ext4 keep an in-memory hash of inodes it reads? It seems to me that would be a very clever and very useful thing for an fs to do. -- alan dot mckinnon at gmail dot com