Re: [389-users] Error code 51 and replication errors

Rich Megginson Wed, 22 Oct 2014 10:47:26 -0700

On 10/22/2014 11:35 AM, Shilen Patel wrote:

Thanks for the information. I’m actually running 6.5 not 6.6. Thelatest version I’m seeing for 6.5 is 1.2.11.15-34.el6_5. Is thatversion for 6.5 about the same (in terms of bug fixes) as 1.2.11.15-47in 6.6?

Is 1.2.11.15-34.el6_5 the same as 1.2.11.15-47? No. -47 has a lot morebug fixes.

If so, I’ll check out 1.2.11.15-34 in 6.5. Otherwise, I’ll upgrade to6.6 first. Appreciate the help.


Thanks!

— Shilen

From: Rich Megginson <[email protected] <mailto:[email protected]>>

Reply-To: "[email protected]<mailto:[email protected]>"<[email protected]<mailto:[email protected]>>

Date: Wednesday, October 22, 2014 at 1:10 PM

To: "[email protected]<mailto:[email protected]>"<[email protected]<mailto:[email protected]>>

Subject: Re: [389-users] Error code 51 and replication errors

    On 10/22/2014 10:58 AM, Shilen Patel wrote:

    1.2.11.15 is a couple of years old?


    Yes and no.  1.2.11.15 was the starting point for EL6. However,
    many, many features and fixes have been backported from later
    versions into 1.2.11.15-47 in EL 6.6.

    I had to upgrade to the latest in copr because of another issue
    that I think was fixed in 1.2.11.30.


    Has that issue been fixed in 1.2.11.15-47 in EL 6.6?  I know a lot
    of 389 community members running on EL6 were using
    fedorapeople/copr repos because they could not wait until those
    fixes/features were available in EL 6.6.  Now that EL 6.6 is out,
    I encourage you (and anyone else in this situation) to stop using
    fedorapeople/copr builds and instead use 1.2.11.15-47 in EL 6.6.

    If I’m misunderstanding version numbers in EL vs copr, please let
    me know.


    See above.

    But my main question is the second question regarding best
    practices for detecting replication failures and I think that
    applies to all versions?


    nsds5replicaLastUpdateStatus is the documented way to get
    replication status.  The fact that this error is not being
    reported that way seems like a bug.
    You can also monitor the errors logs.

    As for this particular problem, see
    https://fedorahosted.org/389/ticket/47409


    Thanks!

    — Shilen

    From: Rich Megginson <[email protected]
    <mailto:[email protected]>>
    Reply-To: "[email protected]
    <mailto:[email protected]>"
    <[email protected]
    <mailto:[email protected]>>
    Date: Wednesday, October 22, 2014 at 12:14 PM
    To: "[email protected]
    <mailto:[email protected]>"
    <[email protected]
    <mailto:[email protected]>>
    Subject: Re: [389-users] Error code 51 and replication errors

        On 10/22/2014 10:10 AM, Shilen Patel wrote:


        389-ds-base-1.2.11.32-1.el6.x86_64


        I would strongly encourage you to use the version provided
        with EL 6.6, which is 389-ds-base-1.2.11.15-47.  It looks
        like you are using a build from the old rmeggins repo or the
        newer copr repo.  These are really only for those users who
        needed critical fixes or features not yet in the "supported"
        EL6.6 version.  I don't know if that will fix your problem,
        but it will make it a lot easier to support.


        Thanks!

        — Shilen

        From: Rich Megginson <[email protected]
        <mailto:[email protected]>>
        Reply-To: "[email protected]
        <mailto:[email protected]>"
        <[email protected]
        <mailto:[email protected]>>
        Date: Wednesday, October 22, 2014 at 12:07 PM
        To: "[email protected]
        <mailto:[email protected]>"
        <[email protected]
        <mailto:[email protected]>>
        Subject: Re: [389-users] Error code 51 and replication errors

            On 10/22/2014 09:54 AM, Shilen Patel wrote:

            Hi,

            I’m running 1.2.11.32.


            What is output of rpm -q 389-ds-base?

            I have 6 replicas (two of which are read-only).  I ran
            into an issue where a DELETE operation failed on a
            server with error code 51 (ldap busy).

            [21/Oct/2014:23:44:44 -0400] conn=78160 op=39510 RESULT
            err=51 tag=107 nentries=0 etime=3 csn=5447282c000300050000


            The application retried the delete several times for a
            couple of hours (while the server wasn’t getting any
            other requests) and the result was always the same
            (err=51).  Each time that happened, the error log had
            the following:

            [21/Oct/2014:23:44:44 -0400] - Retry count exceeded in
            delete


            My first question is, what would cause a problem like this?

            I simply restarted that directory and then the update
            succeeded.  However, when the update went to the other
            5 servers, they failed in the same way and the same
            error was logged in their log files.  But the update
            wasn’t retried.  It was just skipped and future updates
            via replication succeeded on those 5 servers.

            My second question is, what’s the best way to monitor
            for these types of replication errors?  In this
            case, nsds5replicaLastUpdateStatus did not indicate a
            problem.  If I had not been looking at the error file
            on those 5 hosts, I’m wondering how I would have known
            that a delete failed to replicate to them.  If the
            answer is to just have something monitoring the error
            log files, are there specific search strings to look
            for to separate out updates that have failed and won’t
            be retried from other errors (e.g. temporary connection
            issues)?  Just curious if there is a best practice here.

            Thanks!

            — Shilen


            --
            389 users mailing list
            
[email protected]https://admin.fedoraproject.org/mailman/listinfo/389-users




        --
        389 users mailing list
        
[email protected]https://admin.fedoraproject.org/mailman/listinfo/389-users




    --
    389 users mailing list
    
[email protected]https://admin.fedoraproject.org/mailman/listinfo/389-users




--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Re: [389-users] Error code 51 and replication errors

Reply via email to