One of the perennial topics on #cyrus is "what about a more configurable set of 
cached headers".

There's a couple of things about that.  One is that the current cache format is 
"interesting".  Here's my "dumper" format of one:

------------------------------------------------
ENVELOPE: ("Sun, 01 Jan 2012 06:00:01 +0300" "jabber.ru mailing list 
memberships reminder" ((NIL NIL "mailman-owner" "jabber.ru")) ((NIL NIL 
"mailman-bounces" "jabber.ru")) ((NIL NIL "mailman-owner" "jabber.ru")) ((NIL 
NIL "brong" "fastmail.fm")) NIL NIL NIL 
"<mailman.137.1325383201.14742.mail...@jabber.ru>")
BODYSTRUCTURE: ("TEXT" "PLAIN" ("CHARSET" "us-ascii") NIL NIL "7BIT" 1070 23 
NIL NIL NIL NIL)
BODY: ("TEXT" "PLAIN" ("CHARSET" "us-ascii") NIL NIL "7BIT" 1070 23)
SECTION: 0:(0:2218 2218:1070 4294901760) (0:2218 2218:1070 0) ()
HEADERS: X-Spam-score: 2.4
X-Spam-hits: BAYES_50 0.8, DCC_CHECK 1.5, RP_MATCHES_RCVD 0.1, BAYES_USED user,
  SA_VERSION 3.3.1
X-Spam-source: IP='79.137.226.13', Host='mx.jabber.ru', Country='RU', 
FromHeader='ru',
  MailFrom='ru'
X-Resolved-to: br...@fastmail.fm
X-Delivered-to: br...@fastmail.fm
X-Mail-from: mailman-boun...@jabber.ru
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Message-ID: <mailman.137.1325383201.14742.mail...@jabber.ru>
Precedence: bulk
List-Id: <mailman.jabber.ru>
Errors-To: mailman-boun...@jabber.ru
X-Truedomain-Domain: jabber.ru
X-Truedomain-SPF: Pass
X-Truedomain-DKIM: Pass
X-Truedomain: Neutral

FROM: <mailman-ow...@jabber.ru>
TO: <br...@fastmail.fm>
CC: 
BCC: 
SUBJECT: "jabber.ru mailing list memberships reminder"
------------------------------------------------

As you can see, there are some normalised things from some headers.  The same 
information normalised in a DIFFERENT way in the ENVELOPE and then a 
BODYSTRUCTURE and a BODY response.

We have already changed the normalisation rules here a couple of times.

There are two benefits to doing this.

1: reduced CPU usage re-parsing the fields for fast responses.
2: reduced IO because .cache files are a single file, so readahead benefits 
apply.

Really, "2" is the only thing of value these days.  Pretty much the entire 
benefit of the cyrus.cache is reduced IO compared to mapping in each message 
file.

So - I would propose this:

1) keep the BODYSTRUCTURE, it's the result of parsing the entire message, and 
can't be calculated cheaply again
2) keep the SECTION data (possibly along with the bodystructure) - it's the 
offsets for the various parts of the message, same issue
3) add a list of "SUPPRESSED HEADERS".  This would list any header which is 
present in the file, but NOT in the cache.
4) cache every other header, including all the To:, From:, Subject:, etc - in 
as close to raw form as possible.

The entire list of headers to suppress would initially be:

received
dkim-signature
domainkey-signature
domainkey-x509

But it would be configurable as an imapd.conf option.

NOTE: you can still infer the presence or absence just by querying the 
suppressed list - so many messages the entire suppressed list would just be 
'received'.

This should take fairly similar space to what we have now, be more flexible, 
and be more future-proof.  No matter how you want to parse the fields, the 
original values is what you've got!  Even if you change the list of headers you 
suppress, each cache record is complete in itself, so there's no loss of 
fidelity.

It means a little more CPU to calculate the ENVELOPE, but seriously... I don't 
think it's a worry in the current world, and it's not so commonly requested 
anyway.

=====

Thoughts?

Bron.
-- 
  Bron Gondwana
  br...@fastmail.fm

Reply via email to