http://bugzilla.spamassassin.org/show_bug.cgi?id=3852

           Summary: better way to have user configs
           Product: Spamassassin
           Version: 3.0.0
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: normal
          Priority: P5
         Component: Libraries
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


OK, here's something that in my opinion we need to do for 3.1.0. Currently, we
have this situation:

    - spamd daemon starts
    - creates $spamtest Mail::SpamAssassin object
    - daemon reads system-wide configuration (rules, scores, description,
      plugins, lots more) into $spamtest->{conf}.  this is very heavyweight
    - clones the current {conf} object to a "backup"
      using Mail::SpamAssassin::copy_config()
    - forks N processes:
      - foreach request:
        - accepts request from user
        - gets username
        - reads config into $spamtest->{conf}
        - processes request
        - returns scan results
        - clones the "backup" into $spamtest->{conf}, overwriting any
          per-user changes

This is very inefficient.  Although I haven't yet benchmarked it, the size of
the {conf} data structure, with all its rules, scores, et al is huge... and I
suspect this is probably the main reason for bug 3839, the "80% of spamd child
memory is unshared" bug.

I suggest we move to this model instead:

    - spamd daemon starts
    - creates $spamtest Mail::SpamAssassin object
    - daemon reads system-wide configuration (rules, scores, description,
      plugins, lots more) into $spamtest->{conf}->{system}.
    - forks N processes:
      - foreach request:
        - accepts request from user
        - gets username
        - reads config into $spamtest->{conf}->{user}.
        - processes request
        - returns scan results
        - deletes $spamtest->{conf}->{user}.

The key change would be:

pro:
  - the "clone" operation and all copying it performs using Storable, would no
longer be required
  - churning through memory should be reduced; I have a feeling this has a lot
to do with that unshared-memory bug

con:
  - however, we'll have to change code that *accesses* config items to use
another method to get that data.

What I'm suggesting is the following.

  - we implement the "dual-layer" model as above.

  - Most scalar accesses to $conf->{config_item_name} become accesses to
$conf->get(ID).   ID is a numeric constant exported by Constants.pm. (why use
numeric constants?  Simple -- looking up values in a hashref using string
hash-keys is slower than looking up values in an arrayref using integer 
indexes.)

  - More complex accesses must become double-layer aware, so that they can
iterate over hashes and hashes-of-hashes without requiring they be copied first.

  - there should be another API method with get()-like semantics for plugins,
which supports looking up config values using string keys, in a hash, so that
plugins have a sensible method to lookup whatever configuration settings they
too have set.  (reason: we can't know in advance what config settings plugins
use, and it'd be very inflexible to require they register ID-code namespace in
advance.  so let them use strings.)


The downside is that reads of the config data would possibly be slower -- but I
think we may be able to ameliorate that with judicious caching; and I'll bet
that it won't be as slow as the overall time when the Storable copy_config()
operation is included.



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

Reply via email to