On Sun, Nov 01, 2009 at 04:03:01AM +0000, Olly Betts wrote:
> I downloaded mu-0.4 and had a look. It is indeed doing a flushed transaction
> for each deleted document, which will be a lot slower than letting Xapian
> batch
> them up.
OK, thanks for the info.
> The valuerange query to find the document id is also going to slow things down
> a bit (but probably nothing like as much as committing all those
> transactions),
> and could be potentially problematic since that will be O(n) in database size,
> so deleting 1% of a database will be O(n*n) in database size.
Argh!
> At least deleting recent spam is likely to not be a percentage of the database
> size, but it's not a good behaviour to build into the system. It would be
> much
> better to put the unique id in a boolean term, e.g.:
>
> document.add_term("Q" + unique_id);
>
> Then you can just do:
>
> db.delete_document("Q" + unique_id);
Thanks a lot also for the extra investigation, I'll try to mock up a
patch for both issues.
Cheers.
--
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..| . |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]