As a side note, I'd totally advise you to join our IRC channel, it's way easier
to discuss and ask/answer this by direct exchange than by mail :-)

On Thu, Oct 17, 2013 at 01:20:35PM +0200, Gilles Chehade wrote:
> On Thu, Oct 17, 2013 at 11:10:18AM +0100, Simon Brock wrote:
> > Gilles
> > Thanks for the reply -- have trimmed out some bit and answered your 
> > questions.
> > 
> > [...]
> >
> > > This is very likely to get you blacklisted very fast, you're essentially 
> > > telling OpenSMTPD to blast the hotmail servers by making tons of 
> > > connexions and by not introducing any delay between your transactions. 
> > > While this might work for a little while or with little volume, they're 
> > > going to give your IP addresses and domain a very bad reputation.
> > >  
> > I am not completely sure this is a problem. We have historically used much 
> > bigger numbers that this without an issue. We had a a busy day yesterday so 
> > I turned this into a default configuration and had no particular problems 
> > (except for one machine).
> >
> 
> Your call :-)
> 
> 
> > > The problem to be solved is how these services create temporary failures 
> > > -- this is the service telling you to go away and then seeing when you 
> > > come back
> > > 
> > > 
> > > Actually it is a lot more complex than that, OpenSMTPD keeps a routing 
> > > table and keeps track of failures for every route it has managed to 
> > > establish.
> > > It can even differentiate between different kind of failures (no MX 
> > > available, MX are available but do not accept mail, MX has started to 
> > > produce many errors in a row, ...).
> > > When a provider produces many temporary failures in a row, OpenSMTPD will 
> > > assume the MX is having a temporary failure and will mark it unavailable 
> > > for a while using a quadratic delay strategy + a penalty on the envelopes 
> > > that were temporary failed so that they aren't retried in the same order 
> > > in case this was just a coincidence.
> > > 
> > And the standard quadratic back off maximum delay is four hours.
> >
> > To compare with old zmailer 
> > (http://www.zmailer.org/man/scheduler.8zm.html), we could set retry 
> > intervals (e.g. interval and retries -- see the examples). I can understand 
> > the reasoning behind quadratic but in the worst case scenario, it will not 
> > detect that a remote system is back for four hours.
> > 
> 
> Actually, it is more complex than that:
> 
> - there are two kinds of delays, the network / 421 ones will impose a 
> quadratic back off
> 
> - the "too many protocol-level errors" ones will impose a biased delaying so 
> that there
>   are more attempts at lesser intervals IIRC (need to double check, we did a 
> lot of work
>   there)
> 
> - the routing table keeps track of temporary errors and envelopes that have 
> failed so
>   that if an envelope is accepted by a host that was down, envelopes that 
> were failed
>   and had their delay increased can be reattempted right away.
> 
> There's still work to do in that area, it is not impossible that in some 
> situations the
> various mechanisms don't play too good together, which is why I need to 
> figure out what
> goes wrong with you.
> 
> 
> > I guess my question is do I have a way of making OpenSMTPD put an MX into 
> > temporary failure sooner rather than later.
> >
> 
> Well, it depends on what you're trying to do, I don't really understand it 
> yet.
> If you have a MX that was disabled and you want to make it routable again, we 
> have a way with
> smtpctl to reenable a disabled route. We don't have a way to disable an 
> active MX, but this
> can be done quite easily, we just didn't have a use-case.
> 
> 
> > Also the 'different order' can cause interesting problems -- newer emails 
> > in the queue overtaking older ones.
> > 
> 
> Not really, it only affects very few and bounded number of envelopes in a 
> batch,
> and it affects them by imposing a delay that gets compensated for.
> 
> If you ever enter a state where the mechanism kicks in, in the worst case
> scenario you will have 10 envelopes that will have been delayed by a few
> seconds, and in the best scenario this prevents you from hammering YaMoo
> if they expect you to wait.
> 
> You should really read the code in case you're interested in that logic, it's
> quite difficult to explain by mail without making charts to describe the state
> of the scheduler at each quadratic step (I can, but not today as I'm a bit too
> busy at the moment).
> 
> 
> > > Similarly Yahoo and others will stop accepting connections if they think 
> > > you are sending too many messages. At this point, OpenSMTPD backs away 
> > > and the quadratic back off cuts in which increases the retry time. 
> > > Unfortunately these services will accept messages in a finite time.
> > > 
> > > 
> > > I don't quite get the "accept messages in a finite time" part.
> > 
> > Yahoo doesn't seem to mind being contacted later -- it seems to be that 
> > OpenSMTPD backs off quadratically while Yahoo (and others such as Mimecast) 
> > don't mind being retried in a finite time (e.g. 15 mins).
> >
> 
> mh, ok so is your problem the fact that it's a quadratic increase or the fact
> that for some hosts a constant delay is preferable ?
> 
> 
> > > To get round these problems, we have been playing with 'smtpctl schedule' 
> > > and restarting the server. In particular, with Yahoo we can see this 
> > > behaviour:
> > > 
> > > * messages not being delivered;
> > > * schedule the messages -- still not being delivered;
> > > * stop and start the daemon;
> > > * schedule the messages and they are delivered;
> > > 
> > > Problem with this is that the routing information is runtime information 
> > > which gets lost across a restart.
> > > If you're having problems with the default limits, we should really 
> > > understand what is the problem, what limits should be used and then make 
> > > them the default.
> > >  
> > 
> > I completely agree and am happy to try things. Again to back into ancient 
> > history, the zmailer boilerplate scheduler.conf file used to have examples 
> > of different strategies (which are mentioned in the manual page above and 
> > also on the old zmailer website).
> > 
> 
> I understand, you should open a feature request on the bug tracker so that it 
> can be
> discussed technically and not forgotten.
> 
> It is a non-trivial issue, we will clearly not add knobs to tweak every 
> aspect of the
> scheduler since that goes against our goals, but there surely are ways to 
> improve
> the current state for large volumes.
> 
> 
> > Again will do -- I did also note that I can set system wide limits by 
> > removing [domain etc]. I guess also there ought to be a feature request to 
> > put these config options in the manual page (not everyone wants to read 
> > limit.c).
> > 
> 
> well, we hope for people not to tweak limits, we primarily added them to help 
> us with
> finding proper generic limits, only very limited use cases should require 
> tweaking
> them and we'd prefer these to be limited to people that truly understand what 
> they
> imply :-)
> 
> 
> 
> > > Any help, gratefully received -- particularly if it stops me from having 
> > > to dive further into the code!
> > > 
> > > 
> > > What would really help us help you is to provide some figures:
> > > 
> > > - what version of OpenSMTPD are you using ?
> > 
> > I think we are on opensmtpd-201310101759p1 on all machines. I take the 
> > approach of downloading the most current version, try it on one machine and 
> > then push it to others. We are running on CentOS 6.
> >
> 
> nice, we usually run recent snapshots in production after letting them run 
> for a while
> on test machines.
> 
> 
> > > - what kind of volumes are you sending ?
> > 
> > About 100K per weekday -- less at weekends.
> >
> 
> ok, not as high as I expected then
> 
> 
> > > - how many sources are you using ?
> > 
> > Do you mean how many machines? Can I be evasive here and say a few.
> > 
> 
> I was more referring to how many IP addresses, based on the limits you applied
> I assumed you had a single machine with many IP addresses.
> 
> 
> > > - over how much time are you sending these volumes ?
> > > 
> > It tends to be very bursty -- people like their newsletters to land very 
> > soon after they push the send button. 
> > 
> 
> Ok, so basically when you have 100K mails, you just enqueue them right away
> and do not dilute them in time, this could be the difference between your
> use-case and mine, and it could be a hint to why the mechanism do not work
> for you when they do for me, we'll discuss this with eric to see how a
> burst of volume might cause them to fail.
> 
> 
> > > We know of systems that use the default limits to exchange hundreds of 
> > > thousands of mails with gmail / yahoo / hotmail daily, I'm curious about 
> > > what could break your setup :-)
> > > 
> > Gmail is not an issue -- it happily swallows everything we send at it.
> > 
> > Hotmail and Yahoo are still seeing some issues. Mimecast 'grey listing' 
> > remains a problem but that might also the fact we cannot use SPF or DKIM 
> > with that client maybe a contributing factor. Mimecast can be solved by the 
> > 'stop, start and reschedule' which satisfies their grey listing.
> > 
> 
> We already discussed with eric alternatives to quadratic delaying, we
> may need to find a different function with a curve that would slowly
> increase to 15m and stay a while there before increasing again, it was
> just not that urgent so we didn't investigate much.
> 
> 
> -- 
> Gilles Chehade
> 
> https://www.poolp.org                                          @poolpOrg
> 
> -- 
> You received this mail because you are subscribed to [email protected]
> To unsubscribe, send a mail to: [email protected]
> 

-- 
Gilles Chehade

https://www.poolp.org                                          @poolpOrg

-- 
You received this mail because you are subscribed to [email protected]
To unsubscribe, send a mail to: [email protected]

Reply via email to