Ian P. Christian wrote:
> 2009/2/3 Nigel Metheringham <[email protected]>:
>> By definition this box is only getting the least deliverable messages.
>>
>> Which would make me wonder about the idea of very frequent queue
>> runners (but feel free to show me this feeling is wrong).
>
Hmmm.. well having actully come from that '4WPM' telegraph industry, I'd 
submit that retry and timeouts that were appropriate in the fidonet and 
BBS and UUCP era - a time where more networks weren't (networks) or at 
least slower, less reliable, and not even always-up, we might be better 
served to rethink whether we *should even attempt* delivery for anywhere 
near as long as we once had to do.

- Folks nowadays have come to rely on smtp for faster and more 'certain' 
delivery than was expected. And it delivers that - well beyond 
expectations, and cheaply so.

- But that leads us to 'trust' it for more time-sensitive traffic than 
traditional 3 or 4 day retry timeout actually serves.

- Most users in todays' environment would prefer to 'be aware' of a 
problem sooner. Much sooner.

After all, phone and fax are also cheaper than they were in fidonet 
days, so if a message is of suffienct importance or time-sensitivity, a 
failure DSN 'soonest' allows that sort of fallback. Or a manual re-send.

> You are correct - this server is full of mail to domains that are
> currently not accepting mail, hosts that impose greylisting, or any
> other reason for the mail not being immediately deliverable.
>

IF your traffic is 'clean' - i.e. not relayed spam etc, there should be 
very little of what cannot be delivered that will *ever* be deliverable 
by a 'fallback' outbound critter.

Vanishingly small - unless of course your service is being subborned 
into spewing spam, acting as an open relay, supporting a dictionery 
attack from infected boxen on your inside net - or some such rudeness.

Quit early, let the primary send a DSN back to your authenticated 
submission client (and no others), and it is off the queue while they 
seek to correct spelling, or get an email address for their 
correspondent that actually works.

Getting such a DSN back to them from a fallback box to which they do not 
attach and authenticate is tedious at best, risks compounding the 
problem at worst..

> This isn't a problem I can break down by domain, as we're talking
> about mail going from inside our network to outside.
> 

ACK.  I'd simply tune up the primary(ies) and shut the sucker down.

Where you want fallback/failover is on the inbound side so you don't 
become one of those unreachable domains.

;-)

.. and/or a 'pool' of outbound servers, but peers - not a cascade.

> The idea of breaking down the problem by time was to allow for a
> fallback host to handle mail for the first 4 hours, where it might be
> being greylisted - allowing for the queue runners to quickly deal with
> such things, and not get bogged down with 10k's of older mail.
> 

I've found greylisting (for all its negatives), to NOT be a significant 
issue. It is *supposed to* only affect the first message, generally does 
so, and thereafter goes essentially invisible to the sender.

I doubt it has any significant contribution to the balked deliveries on 
your primary that now clog the fallback queue.

But 'undeliverable' is usually just that. It is not all that often it 
improves day 'x' over first-few-minute (milliseconds, even...).

Not even with majority third-world destinations.

> I'm welcome to suggestions that I'm potentially dealing with the issue
> incorrectly, I'm certainly not set on the idea of multi-stage
> fallbacks.  I do remember this being demonstrated by Phil at a
> conference I went to in Cambridge though....
>

Specialty case - Exim can handle all manner of those. But we should not 
always ask it to do 'edge' cases.

The traffic figures you cite sound an awful lot like an abused box or 
user pool with compromised machines.

Question: Your fallback server. Are you certain that no submission can 
be made to it *except* by your own primary? EG - port 25 is not 
listening , and/or it bound to only an internal NIC and IP.

>> You do want to ensure that messages have been routed, so that when a
>> delivery succeeds, another message can be attempted in the same session.
> 
> Sorry, can you expand on what you mean here?

AFAIK, that could imply that if not taking place on the 'primary' box, 
subsequent messages still on the primary are not in the same queue (yet) 
so still sit. Further, any updated routability info the fallback box 
gleans will not be shared. One could find a way to share the caches, 
history / hints DB .. but that probably adds to the wrong side of the 
complexity scorecard. 'KISS'

> 
>> Tweaking of timeouts to avoid tarpits may be useful.
> 
> Any suggestions here would be very welcome.
> 
> Thanks for all those who have posted so far.
> 

I suspect you'll do the most good by taking a fresh look at how your 
primary is set up...

And analyzing the traffic sitting in the queue.

Hint:

SSH in, invoke a simple browser (lynx, links, or such). Point that 
browser into the queue, wander about, and see what the headers and such 
look like.

I'll bet a lot of it is garbage that shoudl never have made it there.

Bill


-- 
## List details at http://lists.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://wiki.exim.org/

Reply via email to