Update - happening at 9pm Melbourne time! I see I set it to be the same UTC always.
On Sun, Apr 3, 2016, at 22:21, Bron Gondwana via Cyrus-devel wrote: > (this is a discussion piece for talking about in tomorrow's meeting, > which IS happening at the regular 10pm Melbourne time - that's now > ANOTHER hour later for everyone due to timezones changing. I > haven't written any code yet) > > On top of Robert's work to support libicu for charset conversion and > pick up all the rest of the character sets it supports, we need to make > some cache format changes. > > I also have a user at FastMail with a 3.8 million message "Deleted > Messages" folder, and I can't keep manually splitting giant folders for > people just because their cyrus.cache file gets over 4 gig. > > So I'm proposing the following changes: > > cyrus.index version 14: > > cyrus.index header: > > * LAST_APPEND_DATE: 32 bit => 64 bit time_t > * POP3_LAST_LOGIN: 32 bit => 64 bit time_t > * LEAKED_CACHE: remove > * FIRST_EXPUNGED: 32 bit => 64 bit time_t > * LAST_REPACK_TIME: 32 bit => 64 bit time_t > * HEADER_FILE_CRC: remove > * RECENT_TIME: 32 bit => 64 bit time_t > * POP3_SHOW_AFTER: 32 bit => 64 bit time_t > * add UNIQUEID: 40 characters (enough space for a uuidgen UUID or > whatever) > * add a bunch of space for un-fixed-width quotaroot and flag names. > > By doing this, we no longer have a separate cyrus.header and > cyrus.index. We only have ONE file in which facts are stored (except > cyrus.annotations, but I have plans for that too). > > If the non-fixed data gets too big then we create a new file called > cyrus.indexoverflow which contains just the non-fixed data. This is > another 99%/1% case. In 99% of cases we won't create enough (i.e. long > flag names) to fill the space. If we fix the header size at 2048 bytes, > we save in the common case of an almost empty mailbox, while still > working for huge mailboxes. > > There's a mailbox options flag to say to read from the > indexoverflow file. > > ACL is no longer stored in this file. It's not a property of the > mailbox in any meaningful way - it belongs out in mailboxes.db and the > next layer up (eventually). > > mailboxname probably will get stored in the mailbox later, when we store > on disk by uniqueid, but that's another yak to shave. > > > cyrus.index record: > > * INTERNALDATE: change 32 bit => 64 bit time_t > * GMTIME: change 32 bit => 64 bit time_t > * SENTDATE: remove (moved to cache) > * HEADER_SIZE: remove (moved to cache) > * LAST_UPDATED: change 32 bit => 64 bit time_t > * CONTENT_LINES: remove (moved to cache) > * CACHE_CRC: remove (moved to cache) > * CACHE_VERSION: remove (moved to cache) > * Add: CACHE_FILE_NUMBER (32 bit) > > Basically I want to remove everything except GMTIME that's derived from > the message out of cyrus.index. cyrus.index is about remembering FACTS > about the mailbox which aren't available anywhere else. It's very > important data. > > cyrus.cache is all re-creatable from the raw messages. > > The reason to keep gmtime is that it's quite common to SORT by sent > date, and making that possible without loading cache is a worthwhile > optimisation. > > ... > > cyrus.cache format changes: > > 1) there's a section in the unstructured data for CACHEACTIVE, > which contains a list of (NUM VERSION FLAGS SIZE DIRTYBYTES) - > probably binary encoded to save space as b32 b16 b16 b32 b32 => > 128 bits per file. > > e.g. (3 5 0 1894322 1647) > > 2) each cyrus.cache file starts with the NUM VERSION FLAGS triple, and > maybe even the SIZE and DIRTYBYTES as well, it wouldn't hurt to > update them after appending new records. > > 3) each cyrus.cache record has structure: > * CACHE_ITEM_LEN 32 bit > * CACHE_VERSION 32 bit > * SENTDATE 64 bit time_t > * HEADER_SIZE 32 bit > * CONTENT_LINES 32 bit > * (existing fields with their individual structure) > * <pad to multiple of 8 bytes> > * CACHE_ITEM_CRC32 32 bit > > > On disk the file names are cyrus.cache.N, e.g. cyrus.cache.3 > > New records are always added to the FIRST active cache file that matches > the criteria of the record, aka if it's ARCHIVED then the first cache > file with the ARCHIVE bit set. > > If a cache file gets too big (compile time option, probably 100 > megabytes or so) then a new file with the next unused number gets > created and added to the start of the list. > > During cyr_expire, if a cache file is more than a configured amount > "dirty" then the records get copied to a newer file and their associated > index records updated to the new locations. Once it's unreferenced, it > can be safely deleted. > > During a normal repack, if most records are being kept, then the > cyrus.cache files will be untouched, saving on IO. > > ..... > > This is all backwards compatible. Earlier cyrus.index versions will > write just a single cache file. The upgrade and downgrade facilities > will still work, and convert just fine. All the existing reading code > will stay. > > I'll convert Robert's cache format change code to also be able to write > the old style (or "unknown" if the charset isn't one of the ones with a > numeric code) values for old cache files. > > Woohoo. No more 64 bit nastiness, reduced cache IO in the common case, > and a savings of 4096 bytes (one file) per mailbox from the super-hot > index location in the common case. > > Bron. > > > -- > Bron Gondwana > br...@fastmail.fm -- Bron Gondwana br...@fastmail.fm