Am 2022-05-14 14:27, schrieb Henrik K:
On Fri, May 07, 2021 at 07:23:05PM +0300, Henrik K wrote:
On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > >  header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > >
> > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> >
> > It might as well be a tflag or something.  Why limit capturing to headers
> > only?
>
> I hadn't intended it to be limited to headers only, but I guess the syntax
> woudl have to be a little different for raw, body, full, etc, since they
> don't have a part keyword in the rule syntax.

Perl already has named capture groups as legit syntax, so it would be most
simple to actually use them.

https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)

header FROM_NAME /^From: "(?<NAME>\w+)/

... just save the matches it in the rule code
$pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};

Then use it in a rule:

body MATCHER /My name is ${FROM_NAME:NAME}/

Don't nitpick on ${}, could be any similar syntax. Code adds this rule to
FROM_NAME dependency chain.  When FROM_NAME hits, run MATCHER regex
(obviously first recompile the regexp).

Implementation pending:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7992

After Henrik has presented his implementation, I guess I have to tell you what I have been working on lately. I am working on a general Tag.pm Plugin. I took the Tagmatch.pm plugin from Paul and rewrote and extended it. With Paul's plugin you can do all kinds of operations on tags (I use tag instead of tagmatch because this looks similar to the header and body keywords). I extended it with a settag command that allows you to extract data from header, body or other tags via regexp and assign it to a tag. These tags can then be used as usual. Coming back to the Esp.pm plugin: for me the definition for an ESP looks like this:

####################
#
# Mailchimp
#
####################

# header field X-MC-User has the customer-id
  settag        _LRZ_MCID_              X-MC-User =~ /^([0-9a-z]{25})$/

# check of tag _LRZ_MCID_, different possibilities
# askdns __LRZ_MCID_FOUND _LRZ_MCID_.esp.dnsbl.lrz.de A 127.0.0.5 # tag __LRZ_MCID_FOUND _LRZ_MCID_ =~ /^566e95f0930918dfb8d575a40$/ # header __LRZ_MCID_FOUND eval:check_in_addrlist('_LRZ_MCID_', Mailchimp)
# tflags        __LRZ_MCID_FOUND        tagify
header __LRZ_MCID_FOUND eval:check_tag_in_addrlist('_LRZ_MCID_', Mailchimp)

# all Mailchimp emails have the X-Mailer header set to "MailChimp Mailer"
  header        __LRZ_XM_MAILCHIMP      X-Mailer =~ /MailChimp\sMailer/

# scoring rule
meta LRZ_MCID_FOUND (__LRZ_MCID_FOUND && __LRZ_XM_MAILCHIMP)
  score         LRZ_MCID_FOUND          7.2

# list of Mailchimp-IDs
  enlist_addrlist       (Mailchimp)     4ecb620f8ed264d1d84aa0981
  enlist_addrlist       (Mailchimp)     566e95f0930918dfb8d575a40

At the moment I am working on the tflags tagify. This should take a normal eval function and automatically allow the usage of tags as arguments. With the above example, it takes the eval function check_in_addrlist which normally would only allow strings as argument and make it work with tags instead. At the moment I have to use the eval function check_tag_in_addrlist where the ability to work with a tag is coded into the function. The other thing which I have not done yet, is using tags in regexps like the example above

body MATCHER /My name is ${FROM_NAME:NAME}/

I had the same idea, instead of the explicit representation _TAG_ for the tag TAG, you could use the alternative form ${TAG} in regular expressions (and maybe templates). And the last point is modifier functions, like Henrik implemented for the HEADER tag: :addr, :name, :trim, :base64, :domain, :lc, :uc, :pop, :first, you name it. It would be best if these modifier functions could be registered by a plugin and then used similarly to eval functions, which are also registered and then used.

For a good and effizient use of lists a full rewrite of the WLBL.pm plugin is needed. E.g. enlist_addrlist can be used for Mailchimp because the customer id is a lowercase hex string, whereas the cid for Salesforce uses lowercase and uppercase chars. Therefore we need lists where we can specify the syntax of the list members.

However, to fully create this design, I believe more time is needed and such functionality should not be incorporated into SpamAssassin until after the 4.0 release. First the handling of the tags must be improved, which is currently totally broken. I am still writing together where the problems with the tags are and how to fix them.

Michael

Reply via email to