On Sat, Oct 31, 2009 at 01:20:47PM +0100, Stefano Zacchiroli wrote:
> Hi Olly, how's going? I hope that the trip back from the GSoC meeting
> found you well (whereas I'm sick again :-/).
Yes, I had a pretty good flight back and even managed to sleep through some bad
turbulence apparently.
> Since doing a single transaction will require changing the callback
> structure in maildir-utils, I would very welcome a comment by some
> Xapian-expert (I've never programmed using it as a library). Do you
> think that the performance issue can be solved by doing one big
> transaction? or else you believe something else is going on? (e.g.,
> should we try to do a delete of several messages as once, if that's
> supported by the Xapian API?).
I downloaded mu-0.4 and had a look. It is indeed doing a flushed transaction
for each deleted document, which will be a lot slower than letting Xapian batch
them up.
The valuerange query to find the document id is also going to slow things down
a bit (but probably nothing like as much as committing all those transactions),
and could be potentially problematic since that will be O(n) in database size,
so deleting 1% of a database will be O(n*n) in database size.
At least deleting recent spam is likely to not be a percentage of the database
size, but it's not a good behaviour to build into the system. It would be much
better to put the unique id in a boolean term, e.g.:
document.add_term("Q" + unique_id);
Then you can just do:
db.delete_document("Q" + unique_id);
Cheers,
Olly
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]