So running commands on alh's massively split folder in the new 
conversation-splitting-world that we have on master, I get this annotated 
callgrind.

Commands:

. login
. select INBOX.alerts
. xconvsort (reverse arrival) (conversations position (691 30)) utf-8 all
. logout

1 OK Completed (in 92.756 secs)  (it's about 15 times slower than running 
without callgrind - this is a 6s command usually)


16,154,183,725  /usr/src/cyrus-next-build/cyrus/master/service.c:main 
[/usr/cyrus-next/libexec/imapd]
16,148,984,824  /usr/src/cyrus-next-build/cyrus/imap/imapd.c:service_main 
[/usr/cyrus-next/libexec/imapd]
16,148,289,216  /usr/src/cyrus-next-build/cyrus/imap/imapd.c:cmdloop 
[/usr/cyrus-next/libexec/imapd]
15,826,028,681  
/usr/src/cyrus-next-build/cyrus/imap/mailbox.c:mailbox_reload_index_record 
[/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
15,820,943,656  
/usr/src/cyrus-next-build/cyrus/imap/mailbox.c:mailbox_read_index_record 
[/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]
15,447,150,662  
/usr/src/cyrus-next-build/cyrus/imap/annotate.c:annotatemore_msg_lookup 
[/usr/cyrus-next/lib/libcyrus_imap.so.0.0.0]

And there we have it. We're spending 95% of that time doing annotation lookups 
for split conversations:

    if ((record->system_flags & FLAG_SPLITCONVERSATION)) {
        struct buf annotval = BUF_INITIALIZER;
        annotatemore_msg_lookup(mailbox->name, record->uid, IMAP_ANNOT_NS 
"basethrid", "", &annotval);
        if (annotval.len == 16) {
            const char *p = buf_cstring(&annotval);
            /* we have a new canonical CID */
            r = parsehex(p, &p, 16, &record->basecid);
        }
        buf_free(&annotval);
    }

Which is all well and good, and we could possibly optimise this a bit more, but 
over 50% is just seeking to find the record in the twoskip file.  That's not 
going to optimise much.  Random access lookups on every read.

Alternative pathway ahead here is to change EVERYONE's cyrus.index files to 
have space to store a separate basecid field.  Downside, it adds 8 bytes to 
every single cyrus.index record, regardless of whether they have any split 
conversations.  Upside, fast access and simplicity on the split conversation 
handling.

(we also want to add some other interesting fields at the same time, for 
example a CREATEDDATE which will allow us to implement purge based on when a 
message was added to a folder by a move rather than on its INTERNALDATE).

... another alternative here is to delay reading basecid until we actually need 
it.  There aren't many places that request it.  If we had rsto's soon-to-land 
wrapper around index records, we could probably delay reading until someone 
asks for the basecid.

Bron.

-- 
  Bron Gondwana
  br...@fastmail.fm

Reply via email to