Thanks. I added /var/log/messages to the gist ( https://gist.github.com/nevsan/8b6f78d7396963dc5f70)--no segfaults it seems. Any other kind of disorderly shutdowns that might happen? I'll look into creating a ticket for this.
On Fri, Apr 4, 2014 at 9:16 PM, Rich Megginson <rmegg...@redhat.com> wrote: > On 04/03/2014 10:25 PM, Nevada Sanchez wrote: > > I followed the instructions that would give me a core dump, and for some > reason, I don't see one in /var/log/dirsrv/slapd-EXAMPLE-COM/, even though > I still see the Disorderly shutdown still shows up in the logs. > > > Hmm - check again - it should produce a core file > > grep -i segfault /var/log/messages > > > I know that when I explicitly request those attributes, I get "-1 Total > update abortedLDAP error: Can't contact LDAP server" for > nds5ReplicaLastInitStatus (see below). Access logs stop completely on the > replica after the time that you mentioned. > > > Hmm - looks like a bug. Please open a ticket. > > > > ====================================================== > [root@ipa2 ipaserver]# ldapsearch ldaps://ipa.example.com:636 -D > 'cn=Directory Manager' -w ##### -b > 'cn=meToipa2.example.com<http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping > tree,cn=config' '(objectClass=*)' -s base nsds5ReplicaLastInitStart > nsds5replicaUpdateInProgress nsds5ReplicaLastInitStatus cn > nsds5BeginReplicaRefresh nsds5ReplicaLastInitEnd > # extended LDIF > # > # LDAPv3 > # base <cn=meToipa2.example.com > <http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping > tree,cn=config> with scope baseObject > # filter: (objectclass=*) > # requesting: ldaps://ipa.example.com:636 (objectClass=*) > nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress > nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh > nsds5ReplicaLastInitEnd > # > > # meToipa2.example.com <http://metoipa2.example.com/>, replica, > dc\3Dexample\2Cdc\3Dcom, > mapping tree, config > dn: cn=meToipa2.example.com <http://metoipa2.example.com/> > ,cn=replica,cn=dc\3Dexample\2Cd > c\3Dcom,cn=mapping tree,cn=config > nsds5ReplicaLastInitStart: 20140401092800Z > nsds5replicaUpdateInProgress: FALSE > nsds5ReplicaLastInitStatus: -1 Total update abortedLDAP error: Can't > contact L > DAP server > cn: meToipa2.example.com <http://metoipa2.example.com/> > nsds5ReplicaLastInitEnd: 20140401092804Z > > # search result > search: 2 > result: 0 Success > > # numResponses: 2 > # numEntries: 1 > > > On Thu, Apr 3, 2014 at 6:32 PM, Rich Megginson <rmegg...@redhat.com>wrote: > >> On 04/03/2014 03:46 PM, Nevada Sanchez wrote: >> >> Okay, I updated the gist and extended some of the logs (ipa2-errors does >> stop at 20:50:21). I'll follow up when I have the debug stuff in place. >> >> https://gist.github.com/nevsan/8b6f78d7396963dc5f70 >> >> >> Another strange thing - it looks as if the initial replica init >> completes successfully. >> >> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - Beginning total >> update of replica "agmt="cn=meToipa2.example.com" (ipa2:389)". >> >> On the replica: >> >> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - >> multimaster_be_state_change: replica dc=example,dc=com is going offline; >> disabling replication >> [02/Apr/2014:20:50:18 +0000] - WARNING: Import is running with >> nsslapd-db-private-import-mem on; No other process is allowed to access the >> database >> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers finished; >> cleaning up... >> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers cleaned up. >> [02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing complete. >> Post-processing... >> [02/Apr/2014:20:50:21 +0000] - import userRoot: Generating >> numSubordinates complete. >> [02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing caches... >> [02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files... >> [02/Apr/2014:20:50:21 +0000] - import userRoot: Import complete. >> Processed 453 entries in 3 seconds. (151.00 entries/sec) >> [02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin - >> multimaster_be_state_change: replica dc=example,dc=com is coming online; >> enabling replication >> >> On the master, access log: >> >> [02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD dn="cn= >> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping >> tree,cn=config" >> >> This is the operation that triggers the replica init. Then >> ipa-replica-install polls for agreement status: >> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH base="cn= >> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping >> tree,cn=config" scope=0 filter="(objectClass=*)" >> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress >> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh >> nsds5replicaLastInitEnd" >> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0 tag=101 >> nentries=1 etime=0 >> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH base="cn= >> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping >> tree,cn=config" scope=0 filter="(objectClass=*)" >> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress >> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh >> nsds5replicaLastInitEnd" >> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0 tag=101 >> nentries=1 etime=0 >> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH base="cn= >> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping >> tree,cn=config" scope=0 filter="(objectClass=*)" >> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress >> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh >> nsds5replicaLastInitEnd" >> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0 tag=101 >> nentries=1 etime=0 >> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH base="cn= >> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping >> tree,cn=config" scope=0 filter="(objectClass=*)" >> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress >> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh >> nsds5replicaLastInitEnd" >> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0 tag=101 >> nentries=1 etime=1 >> >> Something happens here. The replica init is done, according to the >> replica error log. We don't have the replica access log from around this >> time to see exactly when the connection was closed, but looking at the ipa >> code, it would appear that ipa did not see a status of "Total update >> succeeded". Not sure why the master would not have reported that, unless >> there was some problem getting back the status from the replica. >> >> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND >> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1 >> >> Then ipa-replica-install closes the connection and reports the error. >> >> >> >> >> On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson <rmegg...@redhat.com>wrote: >> >>> On 04/02/2014 09:22 PM, Nevada Sanchez wrote: >>> >>> Okay. Updated the gist with the additional logs: >>> https://gist.github.com/nevsan/8b6f78d7396963dc5f70 >>> >>> >>> >>> 1) Dirsrv is crashing: >>> [02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751 >>> starting up >>> [02/Apr/2014:20:49:54 +0000] - Db home directory is not set. Possibly >>> nsslapd-directory (optionally nsslapd-db-home-directory) is missing in the >>> config file. >>> [02/Apr/2014:20:49:54 +0000] - I'm resizing my cache now...cache was >>> 710029312 and is now 8000000 >>> [02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751 >>> starting up >>> [02/Apr/2014:20:49:54 +0000] - Detected Disorderly Shutdown last time >>> Directory Server was running, recovering database. >>> [02/Apr/2014:20:49:55 +0000] - slapd started. Listening on All >>> Interfaces port 389 for LDAP requests >>> >>> Please use the instructions at >>> http://port389.org/wiki/FAQ#Debugging_Crashes to get a core dump and >>> stack trace. >>> >>> 2) The first occurrence of the connection error is at >>> [02/Apr/2014:20:52:38 +0000] but there isn't anything in the consumer error >>> log after [02/Apr/2014:20:50:21 +0000] and in the consumer access log after >>> [02/Apr/2014:20:50:22 +0000] >>> >>> >>> On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson <rmegg...@redhat.com>wrote: >>> >>>> On 04/02/2014 03:01 PM, Nevada Sanchez wrote: >>>> >>>> Okay, I ran it with debug on. The output is quite large. I'm not sure >>>> what the etiquette is for posting large logs, so I threw it on gist here: >>>> https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt<http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt> >>>> >>>> Let me know if I should copy it into the thread instead. >>>> >>>> >>>> Ok. Now can you post excerpts from the dirsrv errors log from both >>>> the master replica and the replica from around the time of the failure? >>>> >>>> >>>> >>>> >>>> On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson <rmegg...@redhat.com>wrote: >>>> >>>>> On 04/02/2014 11:45 AM, Nevada Sanchez wrote: >>>>> >>>>> My apologies. I mistakenly ran the failing ldapsearch from an >>>>> unpriviliged user (couldn't read slapd-EXAMPLE-COM directory). Running as >>>>> root, it now works just fine (same result as the one that worked). SSL >>>>> seems to not be the issue. Also, I haven't change the SSL certs since I >>>>> first set up the master. >>>>> >>>>> I have been doing the replica side things from scratch (even so far >>>>> as starting with a new machine). For the master side, I have just been >>>>> re-preparing the replica. I hope I don't have to start from scratch with >>>>> the master replica. >>>>> >>>>> >>>>> I guess the next step would be to do the ipa-replica-install using >>>>> -ddd and review the extra debug information that comes out. >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Apr 2, 2014 at 11:45 AM, Rob Crittenden >>>>> <rcrit...@redhat.com>wrote: >>>>> >>>>>> Rich Megginson wrote: >>>>>> >>>>>>> On 04/02/2014 09:20 AM, Nevada Sanchez wrote: >>>>>>> >>>>>>>> Okay, we might be on to something: >>>>>>>> >>>>>>>> ipa -> ipa2 >>>>>>>> ================================ >>>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ >>>>>>>> -h ipa2.example.com <http://ipa2.example.com> -s base -b "" >>>>>>>> >>>>>>>> 'objectclass=*' vendorVersion >>>>>>>> dn: >>>>>>>> vendorVersion: 389-Directory/1.3.1.22.a1 B2014.073.1751 >>>>>>>> ================================ >>>>>>>> >>>>>>>> ipa2 -> ipa >>>>>>>> ================================ >>>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ >>>>>>>> -h ipa.example.com <http://ipa.example.com> -s base -b "" >>>>>>>> >>>>>>>> 'objectclass=*' vendorVersion >>>>>>>> ldap_start_tls: Connect error (-11) >>>>>>>> additional info: TLS error -8172:Peer's certificate issuer has been >>>>>>>> marked as not trusted by the user. >>>>>>>> ================================ >>>>>>>> >>>>>>>> The original IPA trusts the replica (since it signed the cert, I >>>>>>>> assume), but the replica doesn't trust the main IPA server. I guess >>>>>>>> the ZZ option would have shown me the failure that I missed in my >>>>>>>> initial ldapsearch tests. >>>>>>>> >>>>>>> -Z[Z] Issue StartTLS (Transport Layer Security) extended >>>>>>> operation. If >>>>>>> you use -ZZ, the command will require the operation >>>>>>> to >>>>>>> be suc- >>>>>>> cessful. >>>>>>> >>>>>>> i.e. use SSL, and force a successful handshake >>>>>>> >>>>>>> >>>>>>>> Anyway, what's the best way to remedy this in a way that makes IPA >>>>>>>> happy? (I've found that LDAP can have different requirements on >>>>>>>> which >>>>>>>> certs go where). >>>>>>>> >>>>>>> >>>>>>> I'm not sure. >>>>>>> ipa-server-install/ipa-replica-prepare/ipa-replica-install >>>>>>> is supposed to take care of installing the CA cert properly for you. >>>>>>> If >>>>>>> you try to hack it and install the CA cert manually, you will >>>>>>> probably >>>>>>> miss something else that ipa install did not do. >>>>>>> >>>>>>> I think the only way to ensure that you have a properly configured >>>>>>> ipa >>>>>>> server + replicas is to get all of the ipa commands completing >>>>>>> successfully. >>>>>>> >>>>>>> Which means going back to the drawing board and starting over from >>>>>>> scratch. >>>>>>> >>>>>> >>>>>> You can compare the certs that each side is using with: >>>>>> >>>>>> # certutil -L -d /etc/dirsrv/slapd-EXAMPLE-COM >>>>>> >>>>>> Did you by chance replace the SSL server certs that IPA uses on your >>>>>> working master? >>>>>> >>>>>> rob >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >
_______________________________________________ Freeipa-users mailing list Freeipa-users@redhat.com https://www.redhat.com/mailman/listinfo/freeipa-users