On Sat, May 08, 2021 at 08:51:29AM -0700, Loren Wilton wrote: > > An alternative approach is creating new strings from parsed data: > > > > string TO_BODY = TO:addr ":" BODY(500) > > > > string TO_BODY ~= /<whatever>/ > > > > the advantage of this is that there are no dependencies. > > > > I'm thinking that BODY(500) would be a multi-line string constructed > > from the first 500 byte of the rendered body. For me having multi-line > > body matching is more important than any of this. > > Hum, interesting. > > As a small nitpick, maybe it's just my 40 years programming C++, but I'm > bothered by using 'string' for both the creation operation and rule-parsing > operation. I realize they are differentiated by the operator, but that just > seems too easy to screw up, at least for me and my bad eyesight. Maybe > 'makestring' and 'string' or some other non-identical pair of words, > whatever seems nice. > > I assume you would want BODY, RAWBODY, FULL, etc. as possibilities. At least > I would. > > I think that rather than the character count, I'd do a range: BODY(1:500) or > the like. This lets you capture from an offset location. > > Actually I think I'd prefer a regex there, at least as an alternative: > BODY/(.{500})/m to get the equivalent first 500 characters. Or BODY/Your > order number (\d+)/ to get a capture of a specific thing from the body. > > Thoughts?
There's already a bug discussing something like this: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=4691 But really, perl/memory/CPU are not a bottleneck anymore, there is no problem matching a ~50k text with regexp. The simplest solution is just ditching the "body chunking" completely in 4.0, or implement new methods for full body matching like I proposed: https://bz.apache.org/SpamAssassin/show_bug.cgi?id=7745