On Thu, Oct 17, 2013 at 11:10:18AM +0100, Simon Brock wrote: > Gilles > Thanks for the reply -- have trimmed out some bit and answered your questions. > > [...] > > > This is very likely to get you blacklisted very fast, you're essentially > > telling OpenSMTPD to blast the hotmail servers by making tons of connexions > > and by not introducing any delay between your transactions. While this > > might work for a little while or with little volume, they're going to give > > your IP addresses and domain a very bad reputation. > > > I am not completely sure this is a problem. We have historically used much > bigger numbers that this without an issue. We had a a busy day yesterday so I > turned this into a default configuration and had no particular problems > (except for one machine). >
Your call :-) > > The problem to be solved is how these services create temporary failures -- > > this is the service telling you to go away and then seeing when you come > > back > > > > > > Actually it is a lot more complex than that, OpenSMTPD keeps a routing > > table and keeps track of failures for every route it has managed to > > establish. > > It can even differentiate between different kind of failures (no MX > > available, MX are available but do not accept mail, MX has started to > > produce many errors in a row, ...). > > When a provider produces many temporary failures in a row, OpenSMTPD will > > assume the MX is having a temporary failure and will mark it unavailable > > for a while using a quadratic delay strategy + a penalty on the envelopes > > that were temporary failed so that they aren't retried in the same order in > > case this was just a coincidence. > > > And the standard quadratic back off maximum delay is four hours. > > To compare with old zmailer (http://www.zmailer.org/man/scheduler.8zm.html), > we could set retry intervals (e.g. interval and retries -- see the examples). > I can understand the reasoning behind quadratic but in the worst case > scenario, it will not detect that a remote system is back for four hours. > Actually, it is more complex than that: - there are two kinds of delays, the network / 421 ones will impose a quadratic back off - the "too many protocol-level errors" ones will impose a biased delaying so that there are more attempts at lesser intervals IIRC (need to double check, we did a lot of work there) - the routing table keeps track of temporary errors and envelopes that have failed so that if an envelope is accepted by a host that was down, envelopes that were failed and had their delay increased can be reattempted right away. There's still work to do in that area, it is not impossible that in some situations the various mechanisms don't play too good together, which is why I need to figure out what goes wrong with you. > I guess my question is do I have a way of making OpenSMTPD put an MX into > temporary failure sooner rather than later. > Well, it depends on what you're trying to do, I don't really understand it yet. If you have a MX that was disabled and you want to make it routable again, we have a way with smtpctl to reenable a disabled route. We don't have a way to disable an active MX, but this can be done quite easily, we just didn't have a use-case. > Also the 'different order' can cause interesting problems -- newer emails in > the queue overtaking older ones. > Not really, it only affects very few and bounded number of envelopes in a batch, and it affects them by imposing a delay that gets compensated for. If you ever enter a state where the mechanism kicks in, in the worst case scenario you will have 10 envelopes that will have been delayed by a few seconds, and in the best scenario this prevents you from hammering YaMoo if they expect you to wait. You should really read the code in case you're interested in that logic, it's quite difficult to explain by mail without making charts to describe the state of the scheduler at each quadratic step (I can, but not today as I'm a bit too busy at the moment). > > Similarly Yahoo and others will stop accepting connections if they think > > you are sending too many messages. At this point, OpenSMTPD backs away and > > the quadratic back off cuts in which increases the retry time. > > Unfortunately these services will accept messages in a finite time. > > > > > > I don't quite get the "accept messages in a finite time" part. > > Yahoo doesn't seem to mind being contacted later -- it seems to be that > OpenSMTPD backs off quadratically while Yahoo (and others such as Mimecast) > don't mind being retried in a finite time (e.g. 15 mins). > mh, ok so is your problem the fact that it's a quadratic increase or the fact that for some hosts a constant delay is preferable ? > > To get round these problems, we have been playing with 'smtpctl schedule' > > and restarting the server. In particular, with Yahoo we can see this > > behaviour: > > > > * messages not being delivered; > > * schedule the messages -- still not being delivered; > > * stop and start the daemon; > > * schedule the messages and they are delivered; > > > > Problem with this is that the routing information is runtime information > > which gets lost across a restart. > > If you're having problems with the default limits, we should really > > understand what is the problem, what limits should be used and then make > > them the default. > > > > I completely agree and am happy to try things. Again to back into ancient > history, the zmailer boilerplate scheduler.conf file used to have examples of > different strategies (which are mentioned in the manual page above and also > on the old zmailer website). > I understand, you should open a feature request on the bug tracker so that it can be discussed technically and not forgotten. It is a non-trivial issue, we will clearly not add knobs to tweak every aspect of the scheduler since that goes against our goals, but there surely are ways to improve the current state for large volumes. > Again will do -- I did also note that I can set system wide limits by > removing [domain etc]. I guess also there ought to be a feature request to > put these config options in the manual page (not everyone wants to read > limit.c). > well, we hope for people not to tweak limits, we primarily added them to help us with finding proper generic limits, only very limited use cases should require tweaking them and we'd prefer these to be limited to people that truly understand what they imply :-) > > Any help, gratefully received -- particularly if it stops me from having to > > dive further into the code! > > > > > > What would really help us help you is to provide some figures: > > > > - what version of OpenSMTPD are you using ? > > I think we are on opensmtpd-201310101759p1 on all machines. I take the > approach of downloading the most current version, try it on one machine and > then push it to others. We are running on CentOS 6. > nice, we usually run recent snapshots in production after letting them run for a while on test machines. > > - what kind of volumes are you sending ? > > About 100K per weekday -- less at weekends. > ok, not as high as I expected then > > - how many sources are you using ? > > Do you mean how many machines? Can I be evasive here and say a few. > I was more referring to how many IP addresses, based on the limits you applied I assumed you had a single machine with many IP addresses. > > - over how much time are you sending these volumes ? > > > It tends to be very bursty -- people like their newsletters to land very soon > after they push the send button. > Ok, so basically when you have 100K mails, you just enqueue them right away and do not dilute them in time, this could be the difference between your use-case and mine, and it could be a hint to why the mechanism do not work for you when they do for me, we'll discuss this with eric to see how a burst of volume might cause them to fail. > > We know of systems that use the default limits to exchange hundreds of > > thousands of mails with gmail / yahoo / hotmail daily, I'm curious about > > what could break your setup :-) > > > Gmail is not an issue -- it happily swallows everything we send at it. > > Hotmail and Yahoo are still seeing some issues. Mimecast 'grey listing' > remains a problem but that might also the fact we cannot use SPF or DKIM with > that client maybe a contributing factor. Mimecast can be solved by the 'stop, > start and reschedule' which satisfies their grey listing. > We already discussed with eric alternatives to quadratic delaying, we may need to find a different function with a curve that would slowly increase to 15m and stay a while there before increasing again, it was just not that urgent so we didn't investigate much. -- Gilles Chehade https://www.poolp.org @poolpOrg -- You received this mail because you are subscribed to [email protected] To unsubscribe, send a mail to: [email protected]
