Again cutting out some bits.

>>> Actually it is a lot more complex than that, [snip]
>>> 
>> And the standard quadratic back off maximum delay is four hours.
>> 
>> To compare with old zmailer (http://www.zmailer.org/man/scheduler.8zm.html), 
>> we could set retry intervals (e.g. interval and retries -- see the 
>> examples). I can understand the reasoning behind quadratic but in the worst 
>> case scenario, it will not detect that a remote system is back for four 
>> hours.
>> 
> 
> Actually, it is more complex than that:
> 
There's a theme starting here :-)

> - there are two kinds of delays, the network / 421 ones will impose a 
> quadratic back off
> 
> - the "too many protocol-level errors" ones will impose a biased delaying so 
> that there
>  are more attempts at lesser intervals IIRC (need to double check, we did a 
> lot of work
>  there)
> 
> - the routing table keeps track of temporary errors and envelopes that have 
> failed so
>  that if an envelope is accepted by a host that was down, envelopes that were 
> failed
>  and had their delay increased can be reattempted right away.
> 
> There's still work to do in that area, it is not impossible that in some 
> situations the
> various mechanisms don't play too good together, which is why I need to 
> figure out what
> goes wrong with you.
> 
Ok -- let me know what I can do either here or via IRC. I can provide log files 
and might be able to give you access at an appropriate time.

> 
>> I guess my question is do I have a way of making OpenSMTPD put an MX into 
>> temporary failure sooner rather than later.
>> 
> 
> Well, it depends on what you're trying to do, I don't really understand it 
> yet.
> If you have a MX that was disabled and you want to make it routable again, we 
> have a way with
> smtpctl to reenable a disabled route. We don't have a way to disable an 
> active MX, but this
> can be done quite easily, we just didn't have a use-case.
> 
> 
Hotmail seems to start sending 'you have sent too many messages per hour' and 
we don't seem to take the hint. Instead we seem to keep trying many messages -- 
which I think makes Hotmail think we are a bot as we don't take the hint.

>> Also the 'different order' can cause interesting problems -- newer emails in 
>> the queue overtaking older ones.
>> 
> 
> Not really, it only affects very few and bounded number of envelopes in a 
> batch,
> and it affects them by imposing a delay that gets compensated for.
> 
> If you ever enter a state where the mechanism kicks in, in the worst case
> scenario you will have 10 envelopes that will have been delayed by a few
> seconds, and in the best scenario this prevents you from hammering YaMoo
> if they expect you to wait.
> 
> You should really read the code in case you're interested in that logic, it's
> quite difficult to explain by mail without making charts to describe the state
> of the scheduler at each quadratic step (I can, but not today as I'm a bit too
> busy at the moment).
> 

Ok -- will look for it again and count the number of message affected :-)

> 
>>> Similarly Yahoo and others will stop accepting connections if they think 
>>> you are sending too many messages. At this point, OpenSMTPD backs away and 
>>> the quadratic back off cuts in which increases the retry time. 
>>> Unfortunately these services will accept messages in a finite time.
>>> 
>>> 
>>> I don't quite get the "accept messages in a finite time" part.
>> 
>> Yahoo doesn't seem to mind being contacted later -- it seems to be that 
>> OpenSMTPD backs off quadratically while Yahoo (and others such as Mimecast) 
>> don't mind being retried in a finite time (e.g. 15 mins).
>> 
> 
> mh, ok so is your problem the fact that it's a quadratic increase or the fact
> that for some hosts a constant delay is preferable ?
> 
I think some hosts will work better with a constant delay -- that might be 
achieved by placing a lower upper limit on the quadratic back off. It may also 
be the case that after a delay, opensmtpd is trying to deliver too many 
messages when the other is saying there is a temporary failure and that might 
cause some services to misinterpret the activity as a bot.

> 
>>> To get round these problems, we have been playing with 'smtpctl schedule' 
>>> and restarting the server. In particular, with Yahoo we can see this 
>>> behaviour:
>>> 
>>> * messages not being delivered;
>>> * schedule the messages -- still not being delivered;
>>> * stop and start the daemon;
>>> * schedule the messages and they are delivered;
>>> 
>>> Problem with this is that the routing information is runtime information 
>>> which gets lost across a restart.
>>> If you're having problems with the default limits, we should really 
>>> understand what is the problem, what limits should be used and then make 
>>> them the default.
>>> 
>> 
>> I completely agree and am happy to try things. Again to back into ancient 
>> history, the zmailer boilerplate scheduler.conf file used to have examples 
>> of different strategies (which are mentioned in the manual page above and 
>> also on the old zmailer website).
>> 
> 
> I understand, you should open a feature request on the bug tracker so that it 
> can be
> discussed technically and not forgotten.
> 
> It is a non-trivial issue, we will clearly not add knobs to tweak every 
> aspect of the
> scheduler since that goes against our goals, but there surely are ways to 
> improve
> the current state for large volumes.
> 
will do.

> 
>> Again will do -- I did also note that I can set system wide limits by 
>> removing [domain etc]. I guess also there ought to be a feature request to 
>> put these config options in the manual page (not everyone wants to read 
>> limit.c).
>> 
> 
> well, we hope for people not to tweak limits, we primarily added them to help 
> us with
> finding proper generic limits, only very limited use cases should require 
> tweaking
> them and we'd prefer these to be limited to people that truly understand what 
> they
> imply :-)
> 
Self documenting code :-)

> 
>>> - how many sources are you using ?
>> 
>> Do you mean how many machines? Can I be evasive here and say a few.
>> 
> 
> I was more referring to how many IP addresses, based on the limits you applied
> I assumed you had a single machine with many IP addresses.
> 
> 
No a few sources.


>>> - over how much time are you sending these volumes ?
>>> 
>> It tends to be very bursty -- people like their newsletters to land very 
>> soon after they push the send button. 
>> 
> 
> Ok, so basically when you have 100K mails, you just enqueue them right away
> and do not dilute them in time, this could be the difference between your
> use-case and mine, and it could be a hint to why the mechanism do not work
> for you when they do for me, we'll discuss this with eric to see how a
> burst of volume might cause them to fail.
> 
Yes -- it is very bursty. So, 100K per day might be submitted in one go.

> 
>>> We know of systems that use the default limits to exchange hundreds of 
>>> thousands of mails with gmail / yahoo / hotmail daily, I'm curious about 
>>> what could break your setup :-)
>>> 
>> Gmail is not an issue -- it happily swallows everything we send at it.
>> 
>> Hotmail and Yahoo are still seeing some issues. Mimecast 'grey listing' 
>> remains a problem but that might also the fact we cannot use SPF or DKIM 
>> with that client maybe a contributing factor. Mimecast can be solved by the 
>> 'stop, start and reschedule' which satisfies their grey listing.
>> 
> 
> We already discussed with eric alternatives to quadratic delaying, we
> may need to find a different function with a curve that would slowly
> increase to 15m and stay a while there before increasing again, it was
> just not that urgent so we didn't investigate much.
> 
Ok

Simon

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to