Garcia Martinez, Raul Lorenzo writes:

« HTML content follows »
RE: [courier-users] Re: 400 Service temporarily unavailable.

Hello,

you was right, Sam. We are running courier restart every time we add a new
"hosteddomain". I'm sorry I didn't realize before. What I had known is
that "makehosteddomains requires a 'courier restart'. makeacceptmailfor
doesn't." (you told the list on 2002-12-25), so everytime a new host is
provisioned in our systems, our scripts run "makehosteddomains" and
afterwards "courier restart". (This has to be this way, isn't it)

Correct.



So this clarifies the "SHUTDOWN: Restarting..." messages. Thank you, Sam.


I am still worried about the "400 Service temporarily unavailable" errors,
quite frequent. These errors appear from time to time, every 5-10 minutes.
They are logged by courieresmtpd, both when receiving local mail and
outgoing mail. This is why I think that the problem has to do with ldap
aliasing. Yesterday we increased the number of courierldapaliasd process
to 10 but the problem remains.

As I stated, I found a file descriptor leak in the 'courierldapaliasd restart'. Increasing the number of processes is not going to help.


Which is the difference between 400 and 450 Service temp... errors? Both happen when contacting ldap, isn't it?

Maybe the OpenLDAP servers cannot manage all that connections?

We are balancing ldap requests to the 2 ldap servers via SSR routers.
Maybe the SSRs are dropping some connections or requests?

It's possible. This is, however, something that you'll need to investigate on your own.

Any idea, please?
(The current version is 0.39.3, quite old, so we've planned updating for
the next week...)

Thanks for your help.

An example of the last hour (I've removed some info)

May 29 16:45:47 m3lnxsva01 courieresmtpd: error,relay=::xxx: 400 Service
temporarily unavailable.

This diagnostic is indeed issued when an alias lookup fails.


You should eliminate the possibility of this being a side effect of the leak
by applying the one-line patch that plugs it.  If, after applying the patch,
you're still seeing this it points the finger directly at the lookups
themselves.  Increasing the number of daemons would help if the failure was
due to insufficient software resources (server slow to respond).  If
increasing the number of daemons makes no difference, look at the hardware.

What happens is that if the connection to the LDAP server breaks for some
reason, courierldapaliasd waits one minute before trying to connect again.
Until that happens, all alias requests are deferred in this manner. So, if a
network glitch broke the connection, aliasing will not work for one minute.

This is because older version of OpenLDAP client libraries leak memory if a
connection request fails.  To prevent all memory from being leaked,
ldapaliasd restarts itself if the connection fails.  To prevent the machine
from forkbombing itself if the LDAP server is down, a one minute timer runs
during which all alias requests are deferred.  Since this is a temporary
deferral, no mail will actually be lost, since the sender will try again
later.  After the one minute timer runs out, ldapaliasd restarts itself.

Before upgrading Courier you should also upgrade your OpenLDAP client
libraries to a more recent version, where this leak is plugged, and compile
Courier against the new version of OpenLDAP, and add LDAP_MEMORY_LEAK=0 to
ldapaliasrc



-------------------------------------------------------
This SF.net email is sponsored by: eBay
Get office equipment for less on eBay!
http://adfarm.mediaplex.com/ad/ck/711-11697-6916-5
_______________________________________________
courier-users mailing list
[EMAIL PROTECTED]
Unsubscribe: https://lists.sourceforge.net/lists/listinfo/courier-users

Reply via email to