Apparently, though unproven, at 23:00 on Wednesday 17 November 2010, Paul 
Hartman did opine thusly:

> On Wed, Nov 17, 2010 at 2:35 PM, Mick <michaelkintz...@gmail.com> wrote:
 
> > Why is the second time so much faster?  The size of the derived db was
> > the same on both occasions.
> 
> I guess caching like Volker said too. What happens if you do something
> like this twice:
> 
> sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb

Now I'm intrigued. I did some quick and nasty tests.

First, mlocate's updatedb. No measures taken to invalidate caches etc:

# time updatedb
real    0m39.265s
user    0m2.245s
sys     0m0.228s


Then unmerge mlocate, emerge slocate, delete all dbs, run slocate's updatedb 
twice:

# rm /var/lib/[ms]locate/*db
# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb
real    1m35.365s
user    0m5.941s
sys     0m0.383s
# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb     
real    1m34.929s
user    0m5.925s
sys     0m0.377s

slocate seems quicker than the few tests I'd already done with mlocate and has 
no optimizations to re-use existing correct data in the db. Now unmerge 
slocate, merge mlocate, do not delete dbs and run mlocate's updatedb twice:

# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb
real    3m50.574s
user    0m7.277s
sys     0m0.361s
# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb
real    1m5.830s
user    0m2.088s
sys     0m0.173s

Second run definitely quicker as it only has to read the fs, not write the 
entire index as well. But that initial run ... The old slocate db was still 
around, possibly affecting the first run, so delete both db's and run 
mlocate's updatedb twice:

# rm /var/lib/[ms]locate/*db
# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb
real    3m51.592s
user    0m7.249s
sys     0m0.350s
# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb
real    1m7.662s
user    0m1.997s
sys     0m0.159s

Almost identical to the prior test, so the presence of slocate's db has no 
effect on mlocate. Then I realized I hadn't measured how long they took to 
reindex a largely cache'd fs so I tried that with both, deleting the db's at 
each test:

slocate:
# rm /var/lib/[ms]locate/*db
rm: cannot remove `/var/lib/[ms]locate/*db': No such file or directory
# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb
real    1m34.341s
user    0m5.929s
sys     0m0.397s
# time updatedb
real    0m2.454s
user    0m0.855s
sys     0m1.569s

mlocate:
# rm /var/lib/[ms]locate/*db
# sync; sh -c "echo 3 > /proc/sys/vm/drop_caches"; time updatedb
real    3m54.792s
user    0m7.215s
sys     0m0.350s
# time updatedb
real    0m0.538s
user    0m0.302s
sys     0m0.232s

0.5 second vs 2.5 seconds. Wow.

Conclusions:

1. mlocate is slow at building it's db from scratch - about 250% as long as 
slocate on the same task. 
2. mlocate is faster at reindexing a largely-unchanged fs - it does it in 
about 66% of the time slocate took.
3. mlocate is insanely quick at reindexing a db that is in cache.

#1 is are - most systems will only do it once
#3 is silly and does not represent anything close to reality
#2 is pretty realistic and a 33% performance boost is significant

I have no idea where the speed increase in #3 comes from. This is an ext4 fs - 
does ext4 keep an in-memory hash of inodes it reads? It seems to me that would 
be a very clever and very useful thing for an fs to do.


-- 
alan dot mckinnon at gmail dot com

Reply via email to