Again cutting out some bits. >>> Actually it is a lot more complex than that, [snip] >>> >> And the standard quadratic back off maximum delay is four hours. >> >> To compare with old zmailer (http://www.zmailer.org/man/scheduler.8zm.html), >> we could set retry intervals (e.g. interval and retries -- see the >> examples). I can understand the reasoning behind quadratic but in the worst >> case scenario, it will not detect that a remote system is back for four >> hours. >> > > Actually, it is more complex than that: > There's a theme starting here :-)
> - there are two kinds of delays, the network / 421 ones will impose a > quadratic back off > > - the "too many protocol-level errors" ones will impose a biased delaying so > that there > are more attempts at lesser intervals IIRC (need to double check, we did a > lot of work > there) > > - the routing table keeps track of temporary errors and envelopes that have > failed so > that if an envelope is accepted by a host that was down, envelopes that were > failed > and had their delay increased can be reattempted right away. > > There's still work to do in that area, it is not impossible that in some > situations the > various mechanisms don't play too good together, which is why I need to > figure out what > goes wrong with you. > Ok -- let me know what I can do either here or via IRC. I can provide log files and might be able to give you access at an appropriate time. > >> I guess my question is do I have a way of making OpenSMTPD put an MX into >> temporary failure sooner rather than later. >> > > Well, it depends on what you're trying to do, I don't really understand it > yet. > If you have a MX that was disabled and you want to make it routable again, we > have a way with > smtpctl to reenable a disabled route. We don't have a way to disable an > active MX, but this > can be done quite easily, we just didn't have a use-case. > > Hotmail seems to start sending 'you have sent too many messages per hour' and we don't seem to take the hint. Instead we seem to keep trying many messages -- which I think makes Hotmail think we are a bot as we don't take the hint. >> Also the 'different order' can cause interesting problems -- newer emails in >> the queue overtaking older ones. >> > > Not really, it only affects very few and bounded number of envelopes in a > batch, > and it affects them by imposing a delay that gets compensated for. > > If you ever enter a state where the mechanism kicks in, in the worst case > scenario you will have 10 envelopes that will have been delayed by a few > seconds, and in the best scenario this prevents you from hammering YaMoo > if they expect you to wait. > > You should really read the code in case you're interested in that logic, it's > quite difficult to explain by mail without making charts to describe the state > of the scheduler at each quadratic step (I can, but not today as I'm a bit too > busy at the moment). > Ok -- will look for it again and count the number of message affected :-) > >>> Similarly Yahoo and others will stop accepting connections if they think >>> you are sending too many messages. At this point, OpenSMTPD backs away and >>> the quadratic back off cuts in which increases the retry time. >>> Unfortunately these services will accept messages in a finite time. >>> >>> >>> I don't quite get the "accept messages in a finite time" part. >> >> Yahoo doesn't seem to mind being contacted later -- it seems to be that >> OpenSMTPD backs off quadratically while Yahoo (and others such as Mimecast) >> don't mind being retried in a finite time (e.g. 15 mins). >> > > mh, ok so is your problem the fact that it's a quadratic increase or the fact > that for some hosts a constant delay is preferable ? > I think some hosts will work better with a constant delay -- that might be achieved by placing a lower upper limit on the quadratic back off. It may also be the case that after a delay, opensmtpd is trying to deliver too many messages when the other is saying there is a temporary failure and that might cause some services to misinterpret the activity as a bot. > >>> To get round these problems, we have been playing with 'smtpctl schedule' >>> and restarting the server. In particular, with Yahoo we can see this >>> behaviour: >>> >>> * messages not being delivered; >>> * schedule the messages -- still not being delivered; >>> * stop and start the daemon; >>> * schedule the messages and they are delivered; >>> >>> Problem with this is that the routing information is runtime information >>> which gets lost across a restart. >>> If you're having problems with the default limits, we should really >>> understand what is the problem, what limits should be used and then make >>> them the default. >>> >> >> I completely agree and am happy to try things. Again to back into ancient >> history, the zmailer boilerplate scheduler.conf file used to have examples >> of different strategies (which are mentioned in the manual page above and >> also on the old zmailer website). >> > > I understand, you should open a feature request on the bug tracker so that it > can be > discussed technically and not forgotten. > > It is a non-trivial issue, we will clearly not add knobs to tweak every > aspect of the > scheduler since that goes against our goals, but there surely are ways to > improve > the current state for large volumes. > will do. > >> Again will do -- I did also note that I can set system wide limits by >> removing [domain etc]. I guess also there ought to be a feature request to >> put these config options in the manual page (not everyone wants to read >> limit.c). >> > > well, we hope for people not to tweak limits, we primarily added them to help > us with > finding proper generic limits, only very limited use cases should require > tweaking > them and we'd prefer these to be limited to people that truly understand what > they > imply :-) > Self documenting code :-) > >>> - how many sources are you using ? >> >> Do you mean how many machines? Can I be evasive here and say a few. >> > > I was more referring to how many IP addresses, based on the limits you applied > I assumed you had a single machine with many IP addresses. > > No a few sources. >>> - over how much time are you sending these volumes ? >>> >> It tends to be very bursty -- people like their newsletters to land very >> soon after they push the send button. >> > > Ok, so basically when you have 100K mails, you just enqueue them right away > and do not dilute them in time, this could be the difference between your > use-case and mine, and it could be a hint to why the mechanism do not work > for you when they do for me, we'll discuss this with eric to see how a > burst of volume might cause them to fail. > Yes -- it is very bursty. So, 100K per day might be submitted in one go. > >>> We know of systems that use the default limits to exchange hundreds of >>> thousands of mails with gmail / yahoo / hotmail daily, I'm curious about >>> what could break your setup :-) >>> >> Gmail is not an issue -- it happily swallows everything we send at it. >> >> Hotmail and Yahoo are still seeing some issues. Mimecast 'grey listing' >> remains a problem but that might also the fact we cannot use SPF or DKIM >> with that client maybe a contributing factor. Mimecast can be solved by the >> 'stop, start and reschedule' which satisfies their grey listing. >> > > We already discussed with eric alternatives to quadratic delaying, we > may need to find a different function with a curve that would slowly > increase to 15m and stay a while there before increasing again, it was > just not that urgent so we didn't investigate much. > Ok Simon
signature.asc
Description: Message signed with OpenPGP using GPGMail
