Ok, here are my thoughts about how to do faster updates.  ie: how
to release rules + scores faster, potentially multiple times a day.
I currently only think rules + scores ought to be released this way -- people
aren't going to be comfortable with automated code updates IMO.  Code/plugins
are best left to full releases.  (plugin support could be easily added later
on, btw.)

Pseudo-code is below, but here's some background details:

Updates occur from "channels".  The default channel is
"updates.spamassassin.org", but the user can specify any number of
channels on the commandline to use additionally.  These can either be
provided by us (think of "updates" being stable vs "expirimental" vs ...),
or some third party (as long as they provide the same infrastructure...)

Updates have version numbers.  The value format of which is irrelevent,
as long as its monotonically increasing.  For our updates I was thinking
SVN revision, but could also do YYYYMMDDVV ala DNS SOA, etc.

Versions are tracked per channel and SpamAssassin version.  To check
for updates, do a DNS TXT query ala "z.y.x.updates.spamassassin.org",
where z.y.x refers to the version of SpamAssassin being used, aka:
x.y.z for 3.0.2, etc.  For simplicitly, wildcards can be used on the
DNS server to match a whole set of releases.  An example:

*.0.3.updates.spamassassin.org TXT "154203"
*.1.3.updates.spamassassin.org TXT "158203"

I haven't decided if that needs to be more machine parsable for future
expansion.  ie: "v=1 ver=154023 ...."   I can't think of anything off hand
that would need to go in there so just a version number is probably ok.

For the initial request, mirrors.channel is a TXT record with an URL for
the MIRRORED.BY (ie: http://spamassassin.apache.org/updates/MIRRORED.BY),
which contains a list of parent URLs, and an optional list of options
per mirror.  ie:

http://spamassassin.apache.org/updates weight=20
http://spamassassin.kluge.net/updates
http://somemirror.example.com/spamassassin/updates weight=4

Means there are 3 mirrors, weighted so the apache.org one will be used the
most (80% of the time), followed by the example.com one (16% of the time),
followed by the kluge.net one (4% of the time).  Weights are default
'1', btw.

The directory that is to be mirrored out appropriately looks like:

dir/
        MIRRORED.BY
        version.ext
        version.ext.sha1
        ...
        versionn.ext
        versionn.ext.sha1

with "version.ext.gpg .. versionnn.ext.gpg" available optionally.
I don't think GPG needs to be required, but for the paranoid amongst us,
it needs to be available as an option.

At the end, the script outputs a number of channel.cf files, which by
default will just be read by SpamAssassin at startup (leaving restarting
spamd up to the admin outside the script, based on exit code...)  If a
different directory is used, admin can simply include the channel.cf
file in their local.cf.

There are a few things I haven't fully fleshed out yet:

1) How to archive the update files together?  I envisioned a similar
naming convention to our normal rules directory (ie: a bunch of files
named ##_type.cf), but the script should just expect to download a single
file which will then be expanded.  I don't want to rely on system calls to
run an expansion, nor do I want to expect tar or zip to be installed, etc.

2) How to validate with GPG?  Similar to the archive issue.  Perhaps using
GnuPG::Interface?  It's really just a wrapper to running gpg from the
commandline, but at least abstracts the issue for platforms where "gpg" isn't
what I think it is.

3) Using "channel.cf" means that it may or may not come after local.cf.
We should probably use some form of prefix to get it to load beforehand,
but what?  People should be able to override the channel config if
they want to.  I don't know if I want "AA_updates_spamassassin_org.cf"
as a file.


Pseudo code:

- Script has a list of GPG keys which are allowed to sign update releases.
  The default is 265FA05B, which is the SA signing key.
- load Mail::SpamAssassin
- load Digest::SHA1
- load LWP
- Accept commandline options for GPG keys to allow for signing in addition
  to default (for third-party updates).
- Accept commandline option for whether or not to use GPG for verification.
- Accept commandline options for additional channels to use beyond
  updates.spamassassin.org
- Accept commandline option for parent directory for updates.  Default is
  whatever the first site_rules_path value is, ie: /etc/mail/spamassassin.
  ala: $msa->first_existing_path (@M::SA::site_rules_path);
- Accept other options such as debug, version, etc.
- exit code = 255
- foreach ( @channels ):
  - Convert channel name to "platform friendly" version?  Is
    "foo.bar.baz.etc.example.com" ok for all platforms?  I was thinking
    s/\./_/g
  - read /dir/channel.cf and get current version from comment on first line
  - convert internal SA version to z.y.x format, and query DNS for
    TXT z.y.x.channel
  - if no answer, throw error, goto next channel
  - for version checks, use ^(\d+) for version.  if same channel will have same
    update version value for different SA versions, can do "1345-3_0".
  - if version is <= current, goto next channel
  - if no /dir/channel/MIRRORED.BY file exists:
    - query DNS for TXT mirrors.channel
    - if no answer, throw error, goto next channel
    - grab URI, write to /dir/channel/MIRRORED.BY
  - read /dir/channel/MIRRORED.BY:
    - add each parent URI to internal array.  if weight given, add URI that
      many times.  (this algorithm can be made more efficient, but it's simple
      for now.)
  - foreach ( pick_random(@mirrors) ):
    - grab parent_uri/version.foo ("foo" depends on the "what archive method" 
issue)
      - if there's an error, go back and choose another mirror
    - grab parent_uri/version.foo.sha1 (ditto foo)
    - do IMS grab for parent_uri/MIRRORED.BY, missing is ok
    - if GPG is enabled, grab parent_uti/version.foo.gpg (ditto foo)
    - an error in either GPG or SHA1 causes an error for the channel, goto
      next channel
    - no error means break out of the mirror loop
    - write files to some temp place (mkdir tmpfile)
    - if no mirrors work completely, channel fails, goto next channel
  - validate version.foo.sha1 internally
    - if failed, fail channel, goto next channel
  - if GPG is enabled, validate version.foo.gpg (depends on the "how to do
    gpg" issue)
    - if failed, fail channel, goto next channel
    - file fails if signature fails, or if signature is ok but not signed by
      list of "trusted" keys
  - remove all files except MIRRORED.BY from /dir/channel
  - remove /dir/channel.cf
  - unarchive version.foo into /dir/channel
    - on error, fail channel, goto next channel
  - move new MIRRORED.BY to /dir/channel if it exists
  - remove temp version.foo* files
  - create new /dir/channel.cf file
    - first line is comment w/ version of channel
    - foreach (readdir(/dir/channel)):
      - add "include /dir/channel/file.cf", only do .cf files
  - exit code = 0
- return exit code

-- 
Randomly Generated Tagline:
Fry: "Well, thanks to the internet I'm now bored with sex. Is ther a place
 on the web that panders to my lust for violence?" 
 Bender: "Is the space-pope reptilian?" 

Attachment: pgpF10xzhzWBG.pgp
Description: PGP signature

Reply via email to