Thanks. I added /var/log/messages to the gist (
https://gist.github.com/nevsan/8b6f78d7396963dc5f70)--no segfaults it
seems. Any other kind of disorderly shutdowns that might happen? I'll look
into creating a ticket for this.


On Fri, Apr 4, 2014 at 9:16 PM, Rich Megginson <rmegg...@redhat.com> wrote:

>  On 04/03/2014 10:25 PM, Nevada Sanchez wrote:
>
> I followed the instructions that would give me a core dump, and for some
> reason, I don't see one in /var/log/dirsrv/slapd-EXAMPLE-COM/, even though
> I still see the Disorderly shutdown still shows up in the logs.
>
>
> Hmm - check again - it should produce a core file
>
> grep -i segfault /var/log/messages
>
>
>  I know that when I explicitly request those attributes, I get "-1 Total
> update abortedLDAP error: Can't contact LDAP server" for
> nds5ReplicaLastInitStatus (see below). Access logs stop completely on the
> replica after the time that you mentioned.
>
>
> Hmm - looks like a bug.  Please open a ticket.
>
>
>
>  ======================================================
>  [root@ipa2 ipaserver]# ldapsearch  ldaps://ipa.example.com:636 -D
> 'cn=Directory Manager' -w ##### -b 
> 'cn=meToipa2.example.com<http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
> tree,cn=config' '(objectClass=*)' -s base nsds5ReplicaLastInitStart
> nsds5replicaUpdateInProgress nsds5ReplicaLastInitStatus cn
> nsds5BeginReplicaRefresh nsds5ReplicaLastInitEnd
> # extended LDIF
> #
> # LDAPv3
>  # base <cn=meToipa2.example.com 
> <http://metoipa2.example.com/>,cn=replica,cn=dc\=example\,dc\=com,cn=mapping
> tree,cn=config> with scope baseObject
>  # filter: (objectclass=*)
> # requesting: ldaps://ipa.example.com:636 (objectClass=*)
> nsds5ReplicaLastInitStart nsds5replicaUpdateInProgress
> nsds5ReplicaLastInitStatus cn nsds5BeginReplicaRefresh
> nsds5ReplicaLastInitEnd
> #
>
>  # meToipa2.example.com <http://metoipa2.example.com/>, replica,
> dc\3Dexample\2Cdc\3Dcom,
>   mapping tree, config
> dn: cn=meToipa2.example.com <http://metoipa2.example.com/>
> ,cn=replica,cn=dc\3Dexample\2Cd
>  c\3Dcom,cn=mapping tree,cn=config
> nsds5ReplicaLastInitStart: 20140401092800Z
>  nsds5replicaUpdateInProgress: FALSE
> nsds5ReplicaLastInitStatus: -1 Total update abortedLDAP error: Can't
> contact L
>   DAP server
> cn: meToipa2.example.com <http://metoipa2.example.com/>
>  nsds5ReplicaLastInitEnd: 20140401092804Z
>
>  # search result
> search: 2
>  result: 0 Success
>
>  # numResponses: 2
>  # numEntries: 1
>
>
> On Thu, Apr 3, 2014 at 6:32 PM, Rich Megginson <rmegg...@redhat.com>wrote:
>
>>  On 04/03/2014 03:46 PM, Nevada Sanchez wrote:
>>
>> Okay, I updated the gist and extended some of the logs (ipa2-errors does
>> stop at 20:50:21). I'll follow up when I have the debug stuff in place.
>>
>>  https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>
>>
>>  Another strange thing - it looks as if the initial replica init
>> completes successfully.
>>
>> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin - Beginning total
>> update of replica "agmt="cn=meToipa2.example.com" (ipa2:389)".
>>
>> On the replica:
>>
>> [02/Apr/2014:20:50:18 +0000] NSMMReplicationPlugin -
>> multimaster_be_state_change: replica dc=example,dc=com is going offline;
>> disabling replication
>> [02/Apr/2014:20:50:18 +0000] - WARNING: Import is running with
>> nsslapd-db-private-import-mem on; No other process is allowed to access the
>> database
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers finished;
>> cleaning up...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Workers cleaned up.
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Indexing complete.
>> Post-processing...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Generating
>> numSubordinates complete.
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Flushing caches...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Closing files...
>> [02/Apr/2014:20:50:21 +0000] - import userRoot: Import complete.
>> Processed 453 entries in 3 seconds. (151.00 entries/sec)
>> [02/Apr/2014:20:50:21 +0000] NSMMReplicationPlugin -
>> multimaster_be_state_change: replica dc=example,dc=com is coming online;
>> enabling replication
>>
>> On the master, access log:
>>
>> [02/Apr/2014:20:50:17 +0000] conn=1365 op=15 MOD dn="cn=
>> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config"
>>
>> This is the operation that triggers the replica init.  Then
>> ipa-replica-install polls for agreement status:
>> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 SRCH base="cn=
>> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:19 +0000] conn=1365 op=16 RESULT err=0 tag=101
>> nentries=1 etime=0
>> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 SRCH base="cn=
>> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:20 +0000] conn=1365 op=17 RESULT err=0 tag=101
>> nentries=1 etime=0
>> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 SRCH base="cn=
>> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:21 +0000] conn=1365 op=18 RESULT err=0 tag=101
>> nentries=1 etime=0
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 SRCH base="cn=
>> meToipa2.example.com,cn=replica,cn=dc\3Dexample\2Cdc\3Dcom,cn=mapping
>> tree,cn=config" scope=0 filter="(objectClass=*)"
>> attrs="nsds5replicaLastInitStart nsds5replicaUpdateInProgress
>> nsds5replicaLastInitStatus cn nsds5BeginReplicaRefresh
>> nsds5replicaLastInitEnd"
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=19 RESULT err=0 tag=101
>> nentries=1 etime=1
>>
>> Something happens here.  The replica init is done, according to the
>> replica error log.  We don't have the replica access log from around this
>> time to see exactly when the connection was closed, but looking at the ipa
>> code, it would appear that ipa did not see a status of "Total update
>> succeeded".  Not sure why the master would not have reported that, unless
>> there was some problem getting back the status from the replica.
>>
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 UNBIND
>> [02/Apr/2014:20:50:22 +0000] conn=1365 op=20 fd=114 closed - U1
>>
>> Then ipa-replica-install closes the connection and reports the error.
>>
>>
>>
>>
>> On Thu, Apr 3, 2014 at 10:38 AM, Rich Megginson <rmegg...@redhat.com>wrote:
>>
>>>  On 04/02/2014 09:22 PM, Nevada Sanchez wrote:
>>>
>>> Okay. Updated the gist with the additional logs:
>>> https://gist.github.com/nevsan/8b6f78d7396963dc5f70
>>>
>>>
>>>
>>>  1) Dirsrv is crashing:
>>> [02/Apr/2014:20:49:53 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751
>>> starting up
>>> [02/Apr/2014:20:49:54 +0000] - Db home directory is not set. Possibly
>>> nsslapd-directory (optionally nsslapd-db-home-directory) is missing in the
>>> config file.
>>> [02/Apr/2014:20:49:54 +0000] - I'm resizing my cache now...cache was
>>> 710029312 and is now 8000000
>>> [02/Apr/2014:20:49:54 +0000] - 389-Directory/1.3.1.22.a1 B2014.073.1751
>>> starting up
>>> [02/Apr/2014:20:49:54 +0000] - Detected Disorderly Shutdown last time
>>> Directory Server was running, recovering database.
>>> [02/Apr/2014:20:49:55 +0000] - slapd started. Listening on All
>>> Interfaces port 389 for LDAP requests
>>>
>>> Please use the instructions at
>>> http://port389.org/wiki/FAQ#Debugging_Crashes to get a core dump and
>>> stack trace.
>>>
>>> 2) The first occurrence of the connection error is at
>>> [02/Apr/2014:20:52:38 +0000] but there isn't anything in the consumer error
>>> log after [02/Apr/2014:20:50:21 +0000] and in the consumer access log after
>>> [02/Apr/2014:20:50:22 +0000]
>>>
>>>
>>>   On Wed, Apr 2, 2014 at 9:38 PM, Rich Megginson <rmegg...@redhat.com>wrote:
>>>
>>>>  On 04/02/2014 03:01 PM, Nevada Sanchez wrote:
>>>>
>>>> Okay, I ran it with debug on. The output is quite large. I'm not sure
>>>> what the etiquette is for posting large logs, so I threw it on gist here:
>>>> https://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt<http://gist.githubusercontent.com/nevsan/8b6f78d7396963dc5f70/raw/b76b3c3acce4f12d292d680f4c1dab39c05888d5/gistfile1.txt>
>>>>
>>>>  Let me know if I should copy it into the thread instead.
>>>>
>>>>
>>>>  Ok.  Now can you post excerpts from the dirsrv errors log from both
>>>> the master replica and the replica from around the time of the failure?
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Apr 2, 2014 at 1:49 PM, Rich Megginson <rmegg...@redhat.com>wrote:
>>>>
>>>>>  On 04/02/2014 11:45 AM, Nevada Sanchez wrote:
>>>>>
>>>>> My apologies. I mistakenly ran the failing ldapsearch from an
>>>>> unpriviliged user (couldn't read slapd-EXAMPLE-COM directory). Running as
>>>>> root, it now works just fine (same result as the one that worked). SSL
>>>>> seems to not be the issue. Also, I haven't change the SSL certs since I
>>>>> first set up the master.
>>>>>
>>>>>  I have been doing the replica side things from scratch (even so far
>>>>> as starting with a new machine). For the master side, I have just been
>>>>> re-preparing the replica. I hope I don't have to start from scratch with
>>>>> the master replica.
>>>>>
>>>>>
>>>>>  I guess the next step would be to do the ipa-replica-install using
>>>>> -ddd and review the extra debug information that comes out.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Apr 2, 2014 at 11:45 AM, Rob Crittenden 
>>>>> <rcrit...@redhat.com>wrote:
>>>>>
>>>>>> Rich Megginson wrote:
>>>>>>
>>>>>>>  On 04/02/2014 09:20 AM, Nevada Sanchez wrote:
>>>>>>>
>>>>>>>>  Okay, we might be on to something:
>>>>>>>>
>>>>>>>> ipa -> ipa2
>>>>>>>> ================================
>>>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ
>>>>>>>>  -h ipa2.example.com <http://ipa2.example.com> -s base -b ""
>>>>>>>>
>>>>>>>> 'objectclass=*' vendorVersion
>>>>>>>> dn:
>>>>>>>> vendorVersion: 389-Directory/1.3.1.22.a1 B2014.073.1751
>>>>>>>> ================================
>>>>>>>>
>>>>>>>> ipa2 -> ipa
>>>>>>>> ================================
>>>>>>>> $ LDAPTLS_CACERTDIR=/etc/dirsrv/slapd-EXAMPLE-COM ldapsearch -xLLLZZ
>>>>>>>>  -h ipa.example.com <http://ipa.example.com> -s base -b ""
>>>>>>>>
>>>>>>>> 'objectclass=*' vendorVersion
>>>>>>>> ldap_start_tls: Connect error (-11)
>>>>>>>> additional info: TLS error -8172:Peer's certificate issuer has been
>>>>>>>> marked as not trusted by the user.
>>>>>>>> ================================
>>>>>>>>
>>>>>>>> The original IPA trusts the replica (since it signed the cert, I
>>>>>>>> assume), but the replica doesn't trust the main IPA server. I guess
>>>>>>>> the ZZ option would have shown me the failure that I missed in my
>>>>>>>> initial ldapsearch tests.
>>>>>>>>
>>>>>>>          -Z[Z]  Issue StartTLS (Transport Layer Security) extended
>>>>>>> operation. If
>>>>>>>                you  use  -ZZ, the command will require the operation
>>>>>>> to
>>>>>>> be suc-
>>>>>>>                cessful.
>>>>>>>
>>>>>>> i.e. use SSL, and force a successful handshake
>>>>>>>
>>>>>>>
>>>>>>>> Anyway, what's the best way to remedy this in a way that makes IPA
>>>>>>>> happy? (I've found that LDAP can have different requirements on
>>>>>>>> which
>>>>>>>> certs go where).
>>>>>>>>
>>>>>>>
>>>>>>> I'm not sure.
>>>>>>> ipa-server-install/ipa-replica-prepare/ipa-replica-install
>>>>>>> is supposed to take care of installing the CA cert properly for you.
>>>>>>> If
>>>>>>> you try to hack it and install the CA cert manually, you will
>>>>>>> probably
>>>>>>> miss something else that ipa install did not do.
>>>>>>>
>>>>>>> I think the only way to ensure that you have a properly configured
>>>>>>> ipa
>>>>>>> server + replicas is to get all of the ipa commands completing
>>>>>>> successfully.
>>>>>>>
>>>>>>> Which means going back to the drawing board and starting over from
>>>>>>> scratch.
>>>>>>>
>>>>>>
>>>>>> You can compare the certs that each side is using with:
>>>>>>
>>>>>> # certutil -L -d /etc/dirsrv/slapd-EXAMPLE-COM
>>>>>>
>>>>>> Did you by chance replace the SSL server certs that IPA uses on your
>>>>>> working master?
>>>>>>
>>>>>> rob
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>
_______________________________________________
Freeipa-users mailing list
Freeipa-users@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-users

Reply via email to