OK, here's a trick I was thinking about. Currently we have these massive
hashtable refs:

    $pms->{conf}->{rbl_evals}
                  {head_tests}
                  {body_tests}
                  ....
                  {scoreset}->[0,1,2,3]
                  {tflags}

Each of those is keyed by the name of the rule.

Now the thing is, this is really wasteful - speed-wise (not really
RAM-wise) -- just performing all those hash lookups!   When a message is
scanned, each of the _evals and _tests hashes are iterated over,
extracting the rule name and rule text for every entry. In reality, we
only need the rule text at this point, *not* the name.

  - We have about 700 rules

  - 99% of the time, any given rule will NOT fire, so we should speedup:

        foreach my $rulepat (@{all_rules_of_given_type}) {
          ...
          if ($whatever =~ /$rulepat/) {
            # hit!
          }
          # otherwise miss!
        }

    we should speedup the 'foreach', the rule-text fetch, and the 'miss'.
    note that we don't need to know the rule name until the rule gives
    us a hit!

so I'm thinking that we should replace parts of this with arrays, using
integer indexes, instead of hashes with string indexes.

Array lookups are quite a bit faster than hash lookups.

Each array would have RAM usage of -- guessing -- (size_of_whats_stored +
9100) bytes, since arrays in perl have an overhead of about 13 bytes per
entry.  (this is about the same as hashes iirc, poss a bit less.  not sure
if there'd be RAM savings there, since perl hash keys are refcounted
shared strings iirc.)

we can optimize for the rules that are loaded from the system-wide config,
because (a) allow_user_rules is almost always off, and (b) even if it's
on, I'd guess that most times 99% of the rules that a scan runs would be
system-wide rules anyway.   (we can deal with user-rules by just pushing
them onto the rules array when they're defined, same as the system rules
are done.)

--j.

Reply via email to