On Sat, Feb 22, 2020 at 12:31:46PM -0800, Michael Peddemors wrote: > For the record, we did some investigation into this (best default max_size) > and I should point out, there are some spammers that specifically appear to > attempt to game the max_size limits, eg including a larger attachment, to > hopefully bypass the scanning size. > > I don't think it matters what size you choose, those bad actors will try to > send something bigger. There are other ways to address really big files, > rather than content scanning I would suggest. > > However we did try various sizes, to see how the affect on overhead would be > in a typical deployment, and the original size suggestion is definitely too > big.
What is a "typical deployment"? What original size suggestion? The 2MB? What overhead? CPU? Memory? What SA version was used? > While we now configure at a larger size than the default, going more than 1G > will cause a serious impact, especially against older mail servers who might > have a more limiting hardware spec. 1G? 1MB? What older mail servers? How much CPU and memory might those generally have? Pre-queue scanning is not good idea with tiny servers. For post-queue, it's normal that the queue might occasionally take few more seconds to process. > For those who want to play, experiment with a two pass filtering, where > the first pass is only about message headers, and pulling attachment types > and names, and the second pass does a deeper content filtering, and this > way you can limit the more resource intensive second content scan to a > smaller size, while the first scan can be a lot less resource intensive > and handle much larger file sizes. Sounds way too complicated compared to just scanning everything as is. I've been scanning all mail fully for years without problems (note that I've used trunk/4.0 always). Only thing my amavisd does is truncate large messages to 16MB. :-) It makes no practical difference scanning 512k or 2MB messages. We are are talking about maybe 25MB extra memory usage per SA process. Tiny servers do not usully need to process many concurrent scans. Tiny servers have more problems running f.e. ClamAV with all third party sigs, since it alone requires 1-2GB dedicated memory. Even 16MB mails use only around 330MB per process. Servers with 16GB+ memory are a dime a dozen these days. One is more likely to hit CPU core limits than memory for busy mail servers. There's no practical difference in scan times either. Perhaps you can describe in more detail what the "serious impact" is that you found? <512k mails $ find pristine_ham -size +400000c -a -size -500000c | xargs -L 1 /usr/bin/time spamassassin -t -L >/dev/null 1.60user 0.05system 0:01.67elapsed 99%CPU (0avgtext+0avgdata 132316maxresident)k 1096inputs+0outputs (1major+34827minor)pagefaults 0swaps 1.63user 0.09system 0:01.73elapsed 99%CPU (0avgtext+0avgdata 132580maxresident)k 968inputs+0outputs (0major+34618minor)pagefaults 0swaps 1.52user 0.07system 0:01.60elapsed 99%CPU (0avgtext+0avgdata 132284maxresident)k 816inputs+0outputs (0major+34915minor)pagefaults 0swaps 2.40user 0.07system 0:02.48elapsed 99%CPU (0avgtext+0avgdata 135268maxresident)k 880inputs+0outputs (0major+35548minor)pagefaults 0swaps 1MB mails $ find pristine_ham -size +900000c -a -size -1100000c | xargs -L 1 /usr/bin/time spamassassin -t -L >/dev/null 2.12user 0.06system 0:02.18elapsed 99%CPU (0avgtext+0avgdata 139380maxresident)k 0inputs+0outputs (0major+36647minor)pagefaults 0swaps 1.64user 0.05system 0:01.69elapsed 99%CPU (0avgtext+0avgdata 141888maxresident)k 0inputs+0outputs (0major+36991minor)pagefaults 0swaps 1.63user 0.05system 0:01.68elapsed 99%CPU (0avgtext+0avgdata 143304maxresident)k 0inputs+0outputs (0major+37496minor)pagefaults 0swaps 1.73user 0.06system 0:01.79elapsed 99%CPU (0avgtext+0avgdata 141792maxresident)k 0inputs+0outputs (0major+37058minor)pagefaults 0swaps 2MB mails $ find pristine_ham -size +1900000c -a -size -2100000c | xargs -L 1 /usr/bin/time spamassassin -t -L >/dev/null 1.63user 0.07system 0:01.71elapsed 99%CPU (0avgtext+0avgdata 155816maxresident)k 3848inputs+0outputs (0major+40952minor)pagefaults 0swaps 1.85user 0.09system 0:01.95elapsed 99%CPU (0avgtext+0avgdata 160848maxresident)k 3968inputs+0outputs (0major+41985minor)pagefaults 0swaps 1.66user 0.11system 0:01.77elapsed 99%CPU (0avgtext+0avgdata 151560maxresident)k 3920inputs+0outputs (0major+39746minor)pagefaults 0swaps 1.77user 0.10system 0:01.89elapsed 99%CPU (0avgtext+0avgdata 158072maxresident)k 3904inputs+0outputs (0major+41117minor)pagefaults 0swaps 4MB mails $ find pristine_ham -size +3900000c -a -size -4100000c | xargs -L 1 /usr/bin/time spamassassin -t -L >/dev/null 1.82user 0.09system 0:01.93elapsed 99%CPU (0avgtext+0avgdata 187296maxresident)k 7736inputs+0outputs (0major+49103minor)pagefaults 0swaps 1.92user 0.09system 0:02.01elapsed 99%CPU (0avgtext+0avgdata 186936maxresident)k 0inputs+0outputs (0major+49239minor)pagefaults 0swaps 1.97user 0.05system 0:02.04elapsed 99%CPU (0avgtext+0avgdata 186508maxresident)k 7648inputs+0outputs (0major+49107minor)pagefaults 0swaps 1.85user 0.08system 0:01.94elapsed 99%CPU (0avgtext+0avgdata 187908maxresident)k 7736inputs+0outputs (0major+49221minor)pagefaults 0swaps 8MB mails $ find pristine_ham -size +7900000c -a -size -8100000c | xargs -L 1 /usr/bin/time spamassassin -t -L >/dev/null 2.11user 0.13system 0:02.24elapsed 100%CPU (0avgtext+0avgdata 241152maxresident)k 0inputs+0outputs (0major+63696minor)pagefaults 0swaps 1.98user 0.12system 0:02.11elapsed 99%CPU (0avgtext+0avgdata 221096maxresident)k 0inputs+0outputs (0major+58613minor)pagefaults 0swaps 2.20user 0.10system 0:02.31elapsed 99%CPU (0avgtext+0avgdata 251400maxresident)k 0inputs+0outputs (0major+66194minor)pagefaults 0swaps 16MB mails $ find pristine_ham -size +15900000c -a -size -16100000c | xargs -L 1 /usr/bin/time spamassassin -t -L >/dev/null 2.50user 0.20system 0:02.73elapsed 99%CPU (0avgtext+0avgdata 329520maxresident)k 31368inputs+0outputs (0major+87581minor)pagefaults 0swaps 2.79user 0.14system 0:02.97elapsed 99%CPU (0avgtext+0avgdata 333800maxresident)k 31352inputs+0outputs (0major+88512minor)pagefaults 0swaps Cheers, Henrik
