Ok, here are my thoughts about how to do faster updates. ie: how to release rules + scores faster, potentially multiple times a day. I currently only think rules + scores ought to be released this way -- people aren't going to be comfortable with automated code updates IMO. Code/plugins are best left to full releases. (plugin support could be easily added later on, btw.)
Pseudo-code is below, but here's some background details: Updates occur from "channels". The default channel is "updates.spamassassin.org", but the user can specify any number of channels on the commandline to use additionally. These can either be provided by us (think of "updates" being stable vs "expirimental" vs ...), or some third party (as long as they provide the same infrastructure...) Updates have version numbers. The value format of which is irrelevent, as long as its monotonically increasing. For our updates I was thinking SVN revision, but could also do YYYYMMDDVV ala DNS SOA, etc. Versions are tracked per channel and SpamAssassin version. To check for updates, do a DNS TXT query ala "z.y.x.updates.spamassassin.org", where z.y.x refers to the version of SpamAssassin being used, aka: x.y.z for 3.0.2, etc. For simplicitly, wildcards can be used on the DNS server to match a whole set of releases. An example: *.0.3.updates.spamassassin.org TXT "154203" *.1.3.updates.spamassassin.org TXT "158203" I haven't decided if that needs to be more machine parsable for future expansion. ie: "v=1 ver=154023 ...." I can't think of anything off hand that would need to go in there so just a version number is probably ok. For the initial request, mirrors.channel is a TXT record with an URL for the MIRRORED.BY (ie: http://spamassassin.apache.org/updates/MIRRORED.BY), which contains a list of parent URLs, and an optional list of options per mirror. ie: http://spamassassin.apache.org/updates weight=20 http://spamassassin.kluge.net/updates http://somemirror.example.com/spamassassin/updates weight=4 Means there are 3 mirrors, weighted so the apache.org one will be used the most (80% of the time), followed by the example.com one (16% of the time), followed by the kluge.net one (4% of the time). Weights are default '1', btw. The directory that is to be mirrored out appropriately looks like: dir/ MIRRORED.BY version.ext version.ext.sha1 ... versionn.ext versionn.ext.sha1 with "version.ext.gpg .. versionnn.ext.gpg" available optionally. I don't think GPG needs to be required, but for the paranoid amongst us, it needs to be available as an option. At the end, the script outputs a number of channel.cf files, which by default will just be read by SpamAssassin at startup (leaving restarting spamd up to the admin outside the script, based on exit code...) If a different directory is used, admin can simply include the channel.cf file in their local.cf. There are a few things I haven't fully fleshed out yet: 1) How to archive the update files together? I envisioned a similar naming convention to our normal rules directory (ie: a bunch of files named ##_type.cf), but the script should just expect to download a single file which will then be expanded. I don't want to rely on system calls to run an expansion, nor do I want to expect tar or zip to be installed, etc. 2) How to validate with GPG? Similar to the archive issue. Perhaps using GnuPG::Interface? It's really just a wrapper to running gpg from the commandline, but at least abstracts the issue for platforms where "gpg" isn't what I think it is. 3) Using "channel.cf" means that it may or may not come after local.cf. We should probably use some form of prefix to get it to load beforehand, but what? People should be able to override the channel config if they want to. I don't know if I want "AA_updates_spamassassin_org.cf" as a file. Pseudo code: - Script has a list of GPG keys which are allowed to sign update releases. The default is 265FA05B, which is the SA signing key. - load Mail::SpamAssassin - load Digest::SHA1 - load LWP - Accept commandline options for GPG keys to allow for signing in addition to default (for third-party updates). - Accept commandline option for whether or not to use GPG for verification. - Accept commandline options for additional channels to use beyond updates.spamassassin.org - Accept commandline option for parent directory for updates. Default is whatever the first site_rules_path value is, ie: /etc/mail/spamassassin. ala: $msa->first_existing_path (@M::SA::site_rules_path); - Accept other options such as debug, version, etc. - exit code = 255 - foreach ( @channels ): - Convert channel name to "platform friendly" version? Is "foo.bar.baz.etc.example.com" ok for all platforms? I was thinking s/\./_/g - read /dir/channel.cf and get current version from comment on first line - convert internal SA version to z.y.x format, and query DNS for TXT z.y.x.channel - if no answer, throw error, goto next channel - for version checks, use ^(\d+) for version. if same channel will have same update version value for different SA versions, can do "1345-3_0". - if version is <= current, goto next channel - if no /dir/channel/MIRRORED.BY file exists: - query DNS for TXT mirrors.channel - if no answer, throw error, goto next channel - grab URI, write to /dir/channel/MIRRORED.BY - read /dir/channel/MIRRORED.BY: - add each parent URI to internal array. if weight given, add URI that many times. (this algorithm can be made more efficient, but it's simple for now.) - foreach ( pick_random(@mirrors) ): - grab parent_uri/version.foo ("foo" depends on the "what archive method" issue) - if there's an error, go back and choose another mirror - grab parent_uri/version.foo.sha1 (ditto foo) - do IMS grab for parent_uri/MIRRORED.BY, missing is ok - if GPG is enabled, grab parent_uti/version.foo.gpg (ditto foo) - an error in either GPG or SHA1 causes an error for the channel, goto next channel - no error means break out of the mirror loop - write files to some temp place (mkdir tmpfile) - if no mirrors work completely, channel fails, goto next channel - validate version.foo.sha1 internally - if failed, fail channel, goto next channel - if GPG is enabled, validate version.foo.gpg (depends on the "how to do gpg" issue) - if failed, fail channel, goto next channel - file fails if signature fails, or if signature is ok but not signed by list of "trusted" keys - remove all files except MIRRORED.BY from /dir/channel - remove /dir/channel.cf - unarchive version.foo into /dir/channel - on error, fail channel, goto next channel - move new MIRRORED.BY to /dir/channel if it exists - remove temp version.foo* files - create new /dir/channel.cf file - first line is comment w/ version of channel - foreach (readdir(/dir/channel)): - add "include /dir/channel/file.cf", only do .cf files - exit code = 0 - return exit code -- Randomly Generated Tagline: Fry: "Well, thanks to the internet I'm now bored with sex. Is ther a place on the web that panders to my lust for violence?" Bender: "Is the space-pope reptilian?"
pgpF10xzhzWBG.pgp
Description: PGP signature
