On Sat, May 08, 2021 at 08:51:29AM -0700, Loren Wilton wrote:
> > An alternative approach is creating new strings from parsed data:
> > 
> > string  TO_BODY = TO:addr ":" BODY(500)
> > 
> > string  TO_BODY ~= /<whatever>/
> > 
> > the advantage of this is that there are no dependencies.
> > 
> > I'm thinking that BODY(500) would be a multi-line string constructed
> > from the first 500 byte of the rendered body. For me having multi-line
> > body matching is more important than any of this.
> 
> Hum, interesting.
> 
> As a small nitpick, maybe it's just my 40 years programming C++, but I'm
> bothered by using 'string' for both the creation operation and rule-parsing
> operation. I realize they are differentiated by the operator, but that just
> seems too easy to screw up, at least for me and my bad eyesight. Maybe
> 'makestring' and 'string' or some other non-identical pair of words,
> whatever seems nice.
> 
> I assume you would want BODY, RAWBODY, FULL, etc. as possibilities. At least
> I would.
> 
> I think that rather than the character count, I'd do a range: BODY(1:500) or
> the like. This lets you capture from an offset location.
> 
> Actually I think I'd prefer a regex there, at least as an alternative:
> BODY/(.{500})/m to get the equivalent first 500 characters. Or BODY/Your
> order number (\d+)/ to get a capture of a specific thing from the body.
> 
> Thoughts?

There's already a bug discussing something like this:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=4691

But really, perl/memory/CPU are not a bottleneck anymore, there is no
problem matching a ~50k text with regexp.  The simplest solution is just
ditching the "body chunking" completely in 4.0, or implement new methods for
full body matching like I proposed:

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7745

Reply via email to