As a side note, I'd totally advise you to join our IRC channel, it's way easier to discuss and ask/answer this by direct exchange than by mail :-)
On Thu, Oct 17, 2013 at 01:20:35PM +0200, Gilles Chehade wrote: > On Thu, Oct 17, 2013 at 11:10:18AM +0100, Simon Brock wrote: > > Gilles > > Thanks for the reply -- have trimmed out some bit and answered your > > questions. > > > > [...] > > > > > This is very likely to get you blacklisted very fast, you're essentially > > > telling OpenSMTPD to blast the hotmail servers by making tons of > > > connexions and by not introducing any delay between your transactions. > > > While this might work for a little while or with little volume, they're > > > going to give your IP addresses and domain a very bad reputation. > > > > > I am not completely sure this is a problem. We have historically used much > > bigger numbers that this without an issue. We had a a busy day yesterday so > > I turned this into a default configuration and had no particular problems > > (except for one machine). > > > > Your call :-) > > > > > The problem to be solved is how these services create temporary failures > > > -- this is the service telling you to go away and then seeing when you > > > come back > > > > > > > > > Actually it is a lot more complex than that, OpenSMTPD keeps a routing > > > table and keeps track of failures for every route it has managed to > > > establish. > > > It can even differentiate between different kind of failures (no MX > > > available, MX are available but do not accept mail, MX has started to > > > produce many errors in a row, ...). > > > When a provider produces many temporary failures in a row, OpenSMTPD will > > > assume the MX is having a temporary failure and will mark it unavailable > > > for a while using a quadratic delay strategy + a penalty on the envelopes > > > that were temporary failed so that they aren't retried in the same order > > > in case this was just a coincidence. > > > > > And the standard quadratic back off maximum delay is four hours. > > > > To compare with old zmailer > > (http://www.zmailer.org/man/scheduler.8zm.html), we could set retry > > intervals (e.g. interval and retries -- see the examples). I can understand > > the reasoning behind quadratic but in the worst case scenario, it will not > > detect that a remote system is back for four hours. > > > > Actually, it is more complex than that: > > - there are two kinds of delays, the network / 421 ones will impose a > quadratic back off > > - the "too many protocol-level errors" ones will impose a biased delaying so > that there > are more attempts at lesser intervals IIRC (need to double check, we did a > lot of work > there) > > - the routing table keeps track of temporary errors and envelopes that have > failed so > that if an envelope is accepted by a host that was down, envelopes that > were failed > and had their delay increased can be reattempted right away. > > There's still work to do in that area, it is not impossible that in some > situations the > various mechanisms don't play too good together, which is why I need to > figure out what > goes wrong with you. > > > > I guess my question is do I have a way of making OpenSMTPD put an MX into > > temporary failure sooner rather than later. > > > > Well, it depends on what you're trying to do, I don't really understand it > yet. > If you have a MX that was disabled and you want to make it routable again, we > have a way with > smtpctl to reenable a disabled route. We don't have a way to disable an > active MX, but this > can be done quite easily, we just didn't have a use-case. > > > > Also the 'different order' can cause interesting problems -- newer emails > > in the queue overtaking older ones. > > > > Not really, it only affects very few and bounded number of envelopes in a > batch, > and it affects them by imposing a delay that gets compensated for. > > If you ever enter a state where the mechanism kicks in, in the worst case > scenario you will have 10 envelopes that will have been delayed by a few > seconds, and in the best scenario this prevents you from hammering YaMoo > if they expect you to wait. > > You should really read the code in case you're interested in that logic, it's > quite difficult to explain by mail without making charts to describe the state > of the scheduler at each quadratic step (I can, but not today as I'm a bit too > busy at the moment). > > > > > Similarly Yahoo and others will stop accepting connections if they think > > > you are sending too many messages. At this point, OpenSMTPD backs away > > > and the quadratic back off cuts in which increases the retry time. > > > Unfortunately these services will accept messages in a finite time. > > > > > > > > > I don't quite get the "accept messages in a finite time" part. > > > > Yahoo doesn't seem to mind being contacted later -- it seems to be that > > OpenSMTPD backs off quadratically while Yahoo (and others such as Mimecast) > > don't mind being retried in a finite time (e.g. 15 mins). > > > > mh, ok so is your problem the fact that it's a quadratic increase or the fact > that for some hosts a constant delay is preferable ? > > > > > To get round these problems, we have been playing with 'smtpctl schedule' > > > and restarting the server. In particular, with Yahoo we can see this > > > behaviour: > > > > > > * messages not being delivered; > > > * schedule the messages -- still not being delivered; > > > * stop and start the daemon; > > > * schedule the messages and they are delivered; > > > > > > Problem with this is that the routing information is runtime information > > > which gets lost across a restart. > > > If you're having problems with the default limits, we should really > > > understand what is the problem, what limits should be used and then make > > > them the default. > > > > > > > I completely agree and am happy to try things. Again to back into ancient > > history, the zmailer boilerplate scheduler.conf file used to have examples > > of different strategies (which are mentioned in the manual page above and > > also on the old zmailer website). > > > > I understand, you should open a feature request on the bug tracker so that it > can be > discussed technically and not forgotten. > > It is a non-trivial issue, we will clearly not add knobs to tweak every > aspect of the > scheduler since that goes against our goals, but there surely are ways to > improve > the current state for large volumes. > > > > Again will do -- I did also note that I can set system wide limits by > > removing [domain etc]. I guess also there ought to be a feature request to > > put these config options in the manual page (not everyone wants to read > > limit.c). > > > > well, we hope for people not to tweak limits, we primarily added them to help > us with > finding proper generic limits, only very limited use cases should require > tweaking > them and we'd prefer these to be limited to people that truly understand what > they > imply :-) > > > > > > Any help, gratefully received -- particularly if it stops me from having > > > to dive further into the code! > > > > > > > > > What would really help us help you is to provide some figures: > > > > > > - what version of OpenSMTPD are you using ? > > > > I think we are on opensmtpd-201310101759p1 on all machines. I take the > > approach of downloading the most current version, try it on one machine and > > then push it to others. We are running on CentOS 6. > > > > nice, we usually run recent snapshots in production after letting them run > for a while > on test machines. > > > > > - what kind of volumes are you sending ? > > > > About 100K per weekday -- less at weekends. > > > > ok, not as high as I expected then > > > > > - how many sources are you using ? > > > > Do you mean how many machines? Can I be evasive here and say a few. > > > > I was more referring to how many IP addresses, based on the limits you applied > I assumed you had a single machine with many IP addresses. > > > > > - over how much time are you sending these volumes ? > > > > > It tends to be very bursty -- people like their newsletters to land very > > soon after they push the send button. > > > > Ok, so basically when you have 100K mails, you just enqueue them right away > and do not dilute them in time, this could be the difference between your > use-case and mine, and it could be a hint to why the mechanism do not work > for you when they do for me, we'll discuss this with eric to see how a > burst of volume might cause them to fail. > > > > > We know of systems that use the default limits to exchange hundreds of > > > thousands of mails with gmail / yahoo / hotmail daily, I'm curious about > > > what could break your setup :-) > > > > > Gmail is not an issue -- it happily swallows everything we send at it. > > > > Hotmail and Yahoo are still seeing some issues. Mimecast 'grey listing' > > remains a problem but that might also the fact we cannot use SPF or DKIM > > with that client maybe a contributing factor. Mimecast can be solved by the > > 'stop, start and reschedule' which satisfies their grey listing. > > > > We already discussed with eric alternatives to quadratic delaying, we > may need to find a different function with a curve that would slowly > increase to 15m and stay a while there before increasing again, it was > just not that urgent so we didn't investigate much. > > > -- > Gilles Chehade > > https://www.poolp.org @poolpOrg > > -- > You received this mail because you are subscribed to [email protected] > To unsubscribe, send a mail to: [email protected] > -- Gilles Chehade https://www.poolp.org @poolpOrg -- You received this mail because you are subscribed to [email protected] To unsubscribe, send a mail to: [email protected]
