Am 2022-05-14 14:27, schrieb Henrik K:
On Fri, May 07, 2021 at 07:23:05PM +0300, Henrik K wrote:
On Fri, May 07, 2021 at 09:07:08AM -0700, Loren Wilton wrote:
> > > > header __SUB_CAP Subject:Capture /Your (\w+) Order/i $(__COMPANY)=\1
> > >
> > > Would :capture play well with (e.g.) :addr, :name, :raw, etc?
> >
> > It might as well be a tflag or something. Why limit capturing to headers
> > only?
>
> I hadn't intended it to be limited to headers only, but I guess the syntax
> woudl have to be a little different for raw, body, full, etc, since they
> don't have a part keyword in the rule syntax.
Perl already has named capture groups as legit syntax, so it would be
most
simple to actually use them.
https://perldoc.perl.org/perlre#(?%3CNAME%3Epattern)
header FROM_NAME /^From: "(?<NAME>\w+)/
... just save the matches it in the rule code
$pms->{captured_values}->{FROM_NAME}->{NAME} = $+{NAME};
Then use it in a rule:
body MATCHER /My name is ${FROM_NAME:NAME}/
Don't nitpick on ${}, could be any similar syntax. Code adds this
rule to
FROM_NAME dependency chain. When FROM_NAME hits, run MATCHER regex
(obviously first recompile the regexp).
Implementation pending:
https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7992
After Henrik has presented his implementation, I guess I have to tell
you what I have been working on lately. I am working on a general Tag.pm
Plugin. I took the Tagmatch.pm plugin from Paul and rewrote and extended
it. With Paul's plugin you can do all kinds of operations on tags (I use
tag instead of tagmatch because this looks similar to the header and
body keywords). I extended it with a settag command that allows you to
extract data from header, body or other tags via regexp and assign it to
a tag. These tags can then be used as usual. Coming back to the Esp.pm
plugin: for me the definition for an ESP looks like this:
####################
#
# Mailchimp
#
####################
# header field X-MC-User has the customer-id
settag _LRZ_MCID_ X-MC-User =~ /^([0-9a-z]{25})$/
# check of tag _LRZ_MCID_, different possibilities
# askdns __LRZ_MCID_FOUND _LRZ_MCID_.esp.dnsbl.lrz.de A
127.0.0.5
# tag __LRZ_MCID_FOUND _LRZ_MCID_ =~
/^566e95f0930918dfb8d575a40$/
# header __LRZ_MCID_FOUND
eval:check_in_addrlist('_LRZ_MCID_', Mailchimp)
# tflags __LRZ_MCID_FOUND tagify
header __LRZ_MCID_FOUND
eval:check_tag_in_addrlist('_LRZ_MCID_', Mailchimp)
# all Mailchimp emails have the X-Mailer header set to "MailChimp
Mailer"
header __LRZ_XM_MAILCHIMP X-Mailer =~ /MailChimp\sMailer/
# scoring rule
meta LRZ_MCID_FOUND (__LRZ_MCID_FOUND &&
__LRZ_XM_MAILCHIMP)
score LRZ_MCID_FOUND 7.2
# list of Mailchimp-IDs
enlist_addrlist (Mailchimp) 4ecb620f8ed264d1d84aa0981
enlist_addrlist (Mailchimp) 566e95f0930918dfb8d575a40
At the moment I am working on the tflags tagify. This should take a
normal eval function and automatically allow the usage of tags as
arguments. With the above example, it takes the eval function
check_in_addrlist which normally would only allow strings as argument
and make it work with tags instead. At the moment I have to use the eval
function check_tag_in_addrlist where the ability to work with a tag is
coded into the function. The other thing which I have not done yet, is
using tags in regexps like the example above
body MATCHER /My name is ${FROM_NAME:NAME}/
I had the same idea, instead of the explicit representation _TAG_ for
the tag TAG, you could use the alternative form ${TAG} in regular
expressions (and maybe templates). And the last point is modifier
functions, like Henrik implemented for the HEADER tag: :addr, :name,
:trim, :base64, :domain, :lc, :uc, :pop, :first, you name it. It would
be best if these modifier functions could be registered by a plugin and
then used similarly to eval functions, which are also registered and
then used.
For a good and effizient use of lists a full rewrite of the WLBL.pm
plugin is needed. E.g. enlist_addrlist can be used for Mailchimp because
the customer id is a lowercase hex string, whereas the cid for
Salesforce uses lowercase and uppercase chars. Therefore we need lists
where we can specify the syntax of the list members.
However, to fully create this design, I believe more time is needed and
such functionality should not be incorporated into SpamAssassin until
after the 4.0 release. First the handling of the tags must be improved,
which is currently totally broken. I am still writing together where the
problems with the tags are and how to fix them.
Michael