On Sun, Nov 01, 2009 at 04:03:01AM +0000, Olly Betts wrote:
> I downloaded mu-0.4 and had a look.  It is indeed doing a flushed transaction
> for each deleted document, which will be a lot slower than letting Xapian 
> batch
> them up.

OK, thanks for the info.

> The valuerange query to find the document id is also going to slow things down
> a bit (but probably nothing like as much as committing all those 
> transactions),
> and could be potentially problematic since that will be O(n) in database size,
> so deleting 1% of a database will be O(n*n) in database size.

Argh!

> At least deleting recent spam is likely to not be a percentage of the database
> size, but it's not a good behaviour to build into the system.  It would be 
> much
> better to put the unique id in a boolean term, e.g.:
> 
>     document.add_term("Q" + unique_id);
> 
> Then you can just do:
> 
>     db.delete_document("Q" + unique_id);

Thanks a lot also for the extra investigation, I'll try to mock up a
patch for both issues.

Cheers.

-- 
Stefano Zacchiroli -o- PhD in Computer Science \ PostDoc @ Univ. Paris 7
z...@{upsilon.cc,pps.jussieu.fr,debian.org} -<>- http://upsilon.cc/zack/
Dietro un grande uomo c'è ..|  .  |. Et ne m'en veux pas si je te tutoie
sempre uno zaino ...........| ..: |.... Je dis tu à tous ceux que j'aime



--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to