Hi,

I'm running 1.2.11.32.  I have 6 replicas (two of which are read-only).  I ran 
into an issue where a DELETE operation failed on a server with error code 51 
(ldap busy).


[21/Oct/2014:23:44:44 -0400] conn=78160 op=39510 RESULT err=51 tag=107 
nentries=0 etime=3 csn=5447282c000300050000

The application retried the delete several times for a couple of hours (while 
the server wasn't getting any other requests) and the result was always the 
same (err=51).  Each time that happened, the error log had the following:


[21/Oct/2014:23:44:44 -0400] - Retry count exceeded in delete

My first question is, what would cause a problem like this?

I simply restarted that directory and then the update succeeded.  However, when 
the update went to the other 5 servers, they failed in the same way and the 
same error was logged in their log files.  But the update wasn't retried.  It 
was just skipped and future updates via replication succeeded on those 5 
servers.

My second question is, what's the best way to monitor for these types of 
replication errors?  In this case, nsds5replicaLastUpdateStatus did not 
indicate a problem.  If I had not been looking at the error file on those 5 
hosts, I'm wondering how I would have known that a delete failed to replicate 
to them.  If the answer is to just have something monitoring the error log 
files, are there specific search strings to look for to separate out updates 
that have failed and won't be retried from other errors (e.g. temporary 
connection issues)?  Just curious if there is a best practice here.

Thanks!

- Shilen
--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Reply via email to