Re: Capturing and reusing strings for matching across rules

Michael Storz Sat, 14 May 2022 07:54:19 -0700

Am 2022-05-14 14:27, schrieb Henrik K:

On Fri, May 07, 2021 at 07:23:05PM +0300, Henrik K wrote:

On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > >
> > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> >
> > It might as well be a tflag or something.  Why limit capturing to headers
> > only?
>
> I hadn't intended it to be limited to headers only, but I guess the syntax
> woudl have to be a little different for raw, body, full, etc, since they
> don't have a part keyword in the rule syntax.

Perl already has named capture groups as legit syntax, so it would bemost

simple to actually use them.

https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)

header FROM_NAME /^From: "(?<NAME>\w+)/

... just save the matches it in the rule code
$pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${FROM_NAME:NAME}/

Don't nitpick on ${}, could be any similar syntax. Code adds thisrule to

FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
(obviously first recompile the regexp).


Implementation pending:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7992

After Henrik has presented his implementation, I guess I have to tellyou what I have been working on lately. I am working on a general Tag.pmPlugin. I took the Tagmatch.pm plugin from Paul and rewrote and extendedit. With Paul's plugin you can do all kinds of operations on tags (I usetag instead of tagmatch because this looks similar to the header andbody keywords). I extended it with a settag command that allows you toextract data from header, body or other tags via regexp and assign it toa tag. These tags can then be used as usual. Coming back to the Esp.pmplugin: for me the definition for an ESP looks like this:


####################
#
# Mailchimp
#
####################

# header field X-MC-User has the customer-id
  settag        _LRZ_MCID_              X-MC-User =~ /^([0-9a-z]{25})$/

# check of tag _LRZ_MCID_, different possibilities

# askdns __LRZ_MCID_FOUND _LRZ_MCID_.esp.dnsbl.lrz.de A127.0.0.5# tag __LRZ_MCID_FOUND _LRZ_MCID_ =~/^566e95f0930918dfb8d575a40$/# header __LRZ_MCID_FOUNDeval:check_in_addrlist('_LRZ_MCID_', Mailchimp)

# tflags        __LRZ_MCID_FOUND        tagify

header __LRZ_MCID_FOUNDeval:check_tag_in_addrlist('_LRZ_MCID_', Mailchimp)

# all Mailchimp emails have the X-Mailer header set to "MailChimpMailer"

  header        __LRZ_XM_MAILCHIMP      X-Mailer =~ /MailChimp\sMailer/

# scoring rule

meta LRZ_MCID_FOUND (__LRZ_MCID_FOUND &&__LRZ_XM_MAILCHIMP)

  score         LRZ_MCID_FOUND          7.2

# list of Mailchimp-IDs
  enlist_addrlist       (Mailchimp)     4ecb620f8ed264d1d84aa0981
  enlist_addrlist       (Mailchimp)     566e95f0930918dfb8d575a40

At the moment I am working on the tflags tagify. This should take anormal eval function and automatically allow the usage of tags asarguments. With the above example, it takes the eval functioncheck_in_addrlist which normally would only allow strings as argumentand make it work with tags instead. At the moment I have to use the evalfunction check_tag_in_addrlist where the ability to work with a tag iscoded into the function. The other thing which I have not done yet, isusing tags in regexps like the example above


body MATCHER /My name is ${FROM_NAME:NAME}/

I had the same idea, instead of the explicit representation _TAG_ forthe tag TAG, you could use the alternative form ${TAG} in regularexpressions (and maybe templates). And the last point is modifierfunctions, like Henrik implemented for the HEADER tag: :addr, :name,:trim, :base64, :domain, :lc, :uc, :pop, :first, you name it. It wouldbe best if these modifier functions could be registered by a plugin andthen used similarly to eval functions, which are also registered andthen used.

For a good and effizient use of lists a full rewrite of the WLBL.pmplugin is needed. E.g. enlist_addrlist can be used for Mailchimp becausethe customer id is a lowercase hex string, whereas the cid forSalesforce uses lowercase and uppercase chars. Therefore we need listswhere we can specify the syntax of the list members.

However, to fully create this design, I believe more time is needed andsuch functionality should not be incorporated into SpamAssassin untilafter the 4.0 release. First the handling of the tags must be improved,which is currently totally broken. I am still writing together where theproblems with the tags are and how to fix them.


Michael

Re: Capturing and reusing strings for matching across rules

Reply via email to