[389-users] Re: replication problems

Alberto Viana Wed, 22 Apr 2020 12:45:25 -0700

Mark,

Yes, I'm  in frame 3, and No, I do not know what modification is, sorry. I
think thats what I'm  trying to find out, why one of the servers always
crash if I enable the replication between 2 389.


Maybe reconfigure my replication, enable debug log and see where stops?

What else can I do?

Thanks


On Wed, Apr 22, 2020 at 4:34 PM Mark Reynolds <mreyno...@redhat.com> wrote:

>
> On 4/22/20 3:27 PM, Alberto Viana wrote:
>
> Mark,
>
> Here's:
> (gdb) where
> #0  0x00007ffff455399f in raise () at /lib64/libc.so.6
> #1  0x00007ffff453dcf5 in abort () at /lib64/libc.so.6
> #2  0x00007ffff5430cd0 in PR_Assert () at /lib64/libnspr4.so
> #3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at
> ldap/servers/slapd/valueset.c:471
> #4  0x00007ffff7b72257 in valueset_array_purge (a=0x7fff8c022aa0,
> vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:804
> #5  0x00007ffff7b723c5 in valueset_purge (a=0x7fff8c022aa0,
> vs=0x7fff8c022aa8, csn=0x7fff977fd340) at ldap/servers/slapd/valueset.c:834
> #6  0x00007ffff7ada6fa in entry_delete_present_values_wsi_multi_valued
> (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0,
> csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
>     at ldap/servers/slapd/entrywsi.c:777
> #7  0x00007ffff7ada20d in entry_delete_present_values_wsi
> (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x0,
> csn=0x7fff977fd340, urp=8, mod_op=2, replacevals=0x7fff8c0127c0)
>     at ldap/servers/slapd/entrywsi.c:623
> #8  0x00007ffff7adaa7a in entry_replace_present_values_wsi
> (e=0x7fff8c01f500, type=0x7fff8c012780 "memberOf", vals=0x7fff8c0127c0,
> csn=0x7fff977fd340, urp=8) at ldap/servers/slapd/entrywsi.c:869
> #9  0x00007ffff7adabf1 in entry_apply_mod_wsi (e=0x7fff8c01f500,
> mod=0x7fff8c0127a0, csn=0x7fff977fd340, urp=8) at
> ldap/servers/slapd/entrywsi.c:903
> #10 0x00007ffff7adae52 in entry_apply_mods_wsi (e=0x7fff8c01f500,
> smods=0x7fff977fd3c0, csn=0x7fff8c012160, urp=8) at
> ldap/servers/slapd/entrywsi.c:973
> #11 0x00007fffead19364 in modify_apply_check_expand
>     (pb=0x7fff8c000b20, operation=0x814160, mods=0x7fff8c012750,
> e=0x7fff8c01bc90, ec=0x7fff8c01f480, postentry=0x7fff977fd4b0,
> ldap_result_code=0x7fff977fd434, ldap_result_message=0x7fff977fd4d8)
>     at ldap/servers/slapd/back-ldbm/ldbm_modify.c:247
> #12 0x00007fffead1a430 in ldbm_back_modify (pb=0x7fff8c000b20) at
> ldap/servers/slapd/back-ldbm/ldbm_modify.c:665
> #13 0x00007ffff7b0cd60 in op_shared_modify (pb=0x7fff8c000b20,
> pw_change=0, old_pw=0x0) at ldap/servers/slapd/modify.c:1021
> #14 0x00007ffff7b0b266 in do_modify (pb=0x7fff8c000b20) at
> ldap/servers/slapd/modify.c:380
> #15 0x000000000041592c in connection_dispatch_operation (conn=0x150e220,
> op=0x814160, pb=0x7fff8c000b20) at ldap/servers/slapd/connection.c:638
> #16 0x0000000000417a0e in connection_threadmain () at
> ldap/servers/slapd/connection.c:1767
> #17 0x00007ffff544a568 in _pt_root () at /lib64/libnspr4.so
> #18 0x00007ffff4de52de in start_thread () at /lib64/libpthread.so.0
> #19 0x00007ffff46184b3 in clone () at /lib64/libc.so.6
> (gdb) print *vs->sorted[0]
> Cannot access memory at address 0xffffffffffffffff
>
> Are you in the slapi_valueset_done frame?
>
> Do you know what the modify operation is doing?  It's something with
> memberOf, but if you knew the exact operation, and what the entry looks
> like prior to making that update, it would be very useful to us.
>
> Thanks,
> Mark
>
>
> Thanks,
>
> Alberto Viana
>
> On Wed, Apr 22, 2020 at 4:22 PM Mark Reynolds <mreyno...@redhat.com>
> wrote:
>
>>
>> On 4/22/20 3:15 PM, Alberto Viana wrote:
>>
>> William,
>>
>> Here's:
>>
>> (gdb) frame 3
>> #3  0x00007ffff7b71627 in slapi_valueset_done (vs=0x7fff8c022aa8) at
>> ldap/servers/slapd/valueset.c:471
>> 471        PR_ASSERT((vs->sorted == NULL) || (vs->num <
>> VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >=
>> VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] < vs->num)));
>> (gdb) print *vs
>> $1 = {num = 21, max = 32, sorted = 0x7fff8c023ad0, va = 0x7fff8c022b50}
>>
>> Can you also do a "print *vs->sorted[0]" ?
>>
>> And a "where" so we can see the full stack trace that leads up to this
>> assertion?
>>
>> Thanks,
>>
>> Mark
>>
>>
>>
>> Thanks,
>>
>> Alberto Viana
>>
>> On Sun, Apr 19, 2020 at 8:52 PM William Brown <wbr...@suse.de> wrote:
>>
>>>
>>>
>>> > On 18 Apr 2020, at 02:55, Alberto Viana <alberto...@gmail.com> wrote:
>>> >
>>> > Hi Guys,
>>> >
>>> > I build my own packages (from source), here's the info:
>>> > 389-ds-base-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
>>> > 389-ds-base-debuginfo-1.4.2.8-20200414gitfae920fc8.el8.x86_64.rpm
>>> > python3-lib389-1.4.2.8-20200414gitfae920fc8.el8.noarch.rpm
>>> >
>>> > I'm running in centos8.
>>> >
>>> > Here's what I could debug:
>>> > https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62
>>> > https://gist.github.com/albertocrj/94fc3521024c7a508f1726923936e476
>>>
>>> So that assert seems to be:
>>>
>>> PR_ASSERT((vs->sorted == NULL) || (vs->num <
>>> VALUESET_ARRAY_SORT_THRESHOLD) || ((vs->num >=
>>> VALUESET_ARRAY_SORT_THRESHOLD) && (vs->sorted[0] < vs->num)));
>>>
>>> But it's not clear which condition here is being violated.
>>>
>>> It looks like your catching this in GDB though, so can you go to:
>>>
>>> https://gist.github.com/albertocrj/4d74732e4e357fbc5a27296199127a62
>>>
>>> (gdb) frame 3
>>> (gdb) print *vs
>>>
>>> That would help to work out what condition is incorrectly being asserted
>>> here.
>>>
>>> Thanks!
>>>
>>>
>>> >
>>> >
>>> > Do you guys need something else?
>>> >
>>> > Thanks
>>> >
>>> > Alberto Viana
>>> >
>>> >
>>> >
>>> >
>>> > On Tue, Mar 31, 2020 at 8:03 PM William Brown <wbr...@suse.de> wrote:
>>> >
>>> >
>>> > > On 1 Apr 2020, at 05:18, Mark Reynolds <mreyno...@redhat.com> wrote:
>>> > >
>>> > >
>>> > > On 3/31/20 1:36 PM, Alberto Viana wrote:
>>> > >> Hey Guys,
>>> > >>
>>> > >> 389-Directory/1.4.2.8
>>> > >>
>>> > >> 389 (master) <=> 389 (master)
>>> > >>
>>> > >> In a master to master replication, start to see this error :
>>> > >> [31/Mar/2020:17:30:52.610637150 +0000] - WARN -
>>> NSMMReplicationPlugin - replica_check_for_data_reload - Disorderly shutdown
>>> for replica dc=rnp,dc=local. Check if DB RUV needs to be updated
>>> >
>>> > Also might be good to remind us what distro and packages you have
>>> 389-ds from?
>>> >
>>> > > Looks like the server is crashing which is why you see these
>>> disorderly shutdown messages. Please get a core file and take some stack
>>> traces from it:
>>> > >
>>> > >
>>> http://www.port389.org/docs/389ds/FAQ/faq.html#sts=Debugging%C2%A0Crashes
>>> > >
>>> > > Can you please provide the complete logs?  Also, you might want to
>>> try re-initializing the replication agreement instead of disabling and
>>> re-enabling replication (its less painful and it "might" solve the issue).
>>> > >
>>> > > Mark
>>> > >
>>> > >>
>>> > >> Even after restart the service the problem persists, I have to
>>> disable and re-enable replication (and replication agr) on both sides, it
>>> works for some time, and the problem comes back.
>>> > >>
>>> > >> Any tips?
>>> > >>
>>> > >> Thanks
>>> > >>
>>> > >> Alberto Viana
>>> > >>
>>> > >>
>>> > >> _______________________________________________
>>> > >> 389-users mailing list --
>>> > >> 389-users@lists.fedoraproject.org
>>> > >>
>>> > >> To unsubscribe send an email to
>>> > >> 389-users-le...@lists.fedoraproject.org
>>> > >>
>>> > >> Fedora Code of Conduct:
>>> > >> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>> > >>
>>> > >> List Guidelines:
>>> > >> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> > >>
>>> > >> List Archives:
>>> > >>
>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>> > > --
>>> > >
>>> > > 389 Directory Server Development Team
>>> > >
>>> > > _______________________________________________
>>> > > 389-users mailing list -- 389-users@lists.fedoraproject.org
>>> > > To unsubscribe send an email to
>>> 389-users-le...@lists.fedoraproject.org
>>> > > Fedora Code of Conduct:
>>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>>> > > List Guidelines:
>>> https://fedoraproject.org/wiki/Mailing_list_guidelines
>>> > > List Archives:
>>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>> >
>>> > —
>>> > Sincerely,
>>> >
>>> > William Brown
>>> >
>>> > Senior Software Engineer, 389 Directory Server
>>> > SUSE Labs
>>> >
>>>
>>> —
>>> Sincerely,
>>>
>>> William Brown
>>>
>>> Senior Software Engineer, 389 Directory Server
>>> SUSE Labs
>>>
>>>
>> _______________________________________________
>> 389-users mailing list -- 389-users@lists.fedoraproject.org
>> To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
>> Fedora Code of Conduct: 
>> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
>> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
>> List Archives: 
>> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>>
>> --
>>
>> 389 Directory Server Development Team
>>
>>
> _______________________________________________
> 389-users mailing list -- 389-users@lists.fedoraproject.org
> To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org
>
> --
>
> 389 Directory Server Development Team
>
>

_______________________________________________
389-users mailing list -- 389-users@lists.fedoraproject.org
To unsubscribe send an email to 389-users-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/389-users@lists.fedoraproject.org

[389-users] Re: replication problems

Reply via email to