Bug#535162: maildir-utils: deleting messages is really slow

Stefano Zacchiroli Fri, 30 Oct 2009 04:45:00 -0700

On Fri, Oct 30, 2009 at 02:45:59AM +0100, Norbert Preining wrote:
> from the issue where Stefano posted:
> > 2) I haven't straced it yet, but I read the corresponding source code. 
> > Apparently
> > you're first extracting all indexed messages from sqlite and then, one by 
> > one, you're
> > removing them with a) a DELETE query and b) a Xapian update. I don't know 
> > for Xapian
> > (never programmed with its API), but for Sqlite doing a single DELETE query 
> > with all
> > involved messages can be sensibly faster. If there is something similar for 
> > Xapian,
> > doing a massive remove instead of several single removals might be the key.
> 
> I added some debug statements to the source code and it seems that
> this is the reason.

Uhm, in fact, yesterday I've made some more tests, not directly patching
mu code, but rather working on a copy of its sqlite database. On that
copy, I tried deleting randomly about 4000 messages (over 20000 of
total) executing one delete per statement, of course I took care of
setting the same PRAGMAs of mu (temp_store and synchronous). In fact, it
wasn't slow enough to justify the times we're seeing: all DELETEs were
completed in 2 seconds.

So, yesterday, I was tempted to understand that the cause might have
been another, like the callbacks called upon each removal (which might
trigger I/O, which in theory should *not* be trigger by a single DELETE,
given that synchronous is set to "no") or else the Xapian
message-by-message removal. If you already have some debugging code
around, maybe you can try profiling with gprof so that we will have
actual timings?

Anyhow, trying out this path wouldn't be worse anyhow, I guess :)

> Only one question, if one of you has some experience with sql (as
> I don't). How does one specify a LIST of ids to be removed in a 
> single sql statement?

You just put them all together in a big OR, e.g.;

DELETE FROM message
WHERE id = 1
   OR id = 2
   OR id = 3
   ...;

If you go that way, take care also of preparing the statement only once
before executing (that might be another speed variable, actually).

> Or can we just collect severat sql statements and execute them in one
> go?

That's what transactions should do. AFAIR (but better check!) sqlite has
an implicit transactions wrapping a single instance of it, but I'm not
sure it is enabled by default while accessing the database via the API.

On Fri, Oct 30, 2009 at 03:03:54AM +0100, Norbert Preining wrote:
> One more thing while digging throught he code. Maybe using the
> transaction feature would be nice. For insert/updates a transaction
> is started.

He, see above :-)

> If this is possible for the delete, too, one could set the transactions size
> with --sqlite-transaction-size=1000000 and then in fact only one
> time the whole stuff would be committed, AFAIU.

Sounds like a nice idea.

Thanks for this investigation!
Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime

--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Bug#535162: maildir-utils: deleting messages is really slow

Reply via email to