« HTML content follows » RE: [courier-users] Re: 400 Service temporarily unavailable.
Hello,
you was right, Sam. We are running courier restart every time we add a new "hosteddomain". I'm sorry I didn't realize before. What I had known is that "makehosteddomains requires a 'courier restart'. makeacceptmailfor doesn't." (you told the list on 2002-12-25), so everytime a new host is provisioned in our systems, our scripts run "makehosteddomains" and afterwards "courier restart". (This has to be this way, isn't it)
Correct.
So this clarifies the "SHUTDOWN: Restarting..." messages. Thank you, Sam.
I am still worried about the "400 Service temporarily unavailable" errors, quite frequent. These errors appear from time to time, every 5-10 minutes. They are logged by courieresmtpd, both when receiving local mail and outgoing mail. This is why I think that the problem has to do with ldap aliasing. Yesterday we increased the number of courierldapaliasd process to 10 but the problem remains.
As I stated, I found a file descriptor leak in the 'courierldapaliasd restart'. Increasing the number of processes is not going to help.
Which is the difference between 400 and 450 Service temp... errors? Both happen when contacting ldap, isn't it?
Maybe the OpenLDAP servers cannot manage all that connections?
We are balancing ldap requests to the 2 ldap servers via SSR routers. Maybe the SSRs are dropping some connections or requests?
It's possible. This is, however, something that you'll need to investigate on your own.
Any idea, please? (The current version is 0.39.3, quite old, so we've planned updating for the next week...)
Thanks for your help.
An example of the last hour (I've removed some info)
May 29 16:45:47 m3lnxsva01 courieresmtpd: error,relay=::xxx: 400 Service temporarily unavailable.
This diagnostic is indeed issued when an alias lookup fails.
You should eliminate the possibility of this being a side effect of the leak by applying the one-line patch that plugs it. If, after applying the patch, you're still seeing this it points the finger directly at the lookups themselves. Increasing the number of daemons would help if the failure was due to insufficient software resources (server slow to respond). If increasing the number of daemons makes no difference, look at the hardware.
What happens is that if the connection to the LDAP server breaks for some reason, courierldapaliasd waits one minute before trying to connect again. Until that happens, all alias requests are deferred in this manner. So, if a network glitch broke the connection, aliasing will not work for one minute.
This is because older version of OpenLDAP client libraries leak memory if a connection request fails. To prevent all memory from being leaked, ldapaliasd restarts itself if the connection fails. To prevent the machine from forkbombing itself if the LDAP server is down, a one minute timer runs during which all alias requests are deferred. Since this is a temporary deferral, no mail will actually be lost, since the sender will try again later. After the one minute timer runs out, ldapaliasd restarts itself.
Before upgrading Courier you should also upgrade your OpenLDAP client libraries to a more recent version, where this leak is plugged, and compile Courier against the new version of OpenLDAP, and add LDAP_MEMORY_LEAK=0 to ldapaliasrc
------------------------------------------------------- This SF.net email is sponsored by: eBay Get office equipment for less on eBay! http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5 _______________________________________________ courier-users mailing list [EMAIL PROTECTED] Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users
