Re: [389-users] replication from 1.2.8.3 to 1.2.10.4

Robert Viduya Thu, 12 Jul 2012 13:47:24 -0700

On Jul 12, 2012, at 11:36 AM, Rich Megginson wrote:

> On 07/12/2012 08:50 AM, Robert Viduya wrote:
>> On Jul 11, 2012, at 7:17 PM, Rich Megginson wrote:
>> 
>>> On 07/11/2012 11:12 AM, Robert Viduya wrote:
>>>> 
> So is it possible that the hub was


This question seems incomplete?

> 
> ok - please follow the directions at 
> http://port389.org/wiki/FAQ#Debugging_Crashes to enable core files and get a 
> stack trace
> 
> Also, 1.2.10.12 is available in the testing repos.  Please give this a try.  
> There were a couple of fixes made since 1.2.10.4 that may be applicable:
> 
> Ticket 336 [abrt] 389-ds-base-1.2.10.4-2.fc16: index_range_read_ext: Process 
> /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV)
> Ticket #347 - IPA dirsvr seg-fault during system longevity test
> Ticket #348 - crash in ldap_initialize with multiple threads
> Ticket #361: Bad DNs in ACIs can segfault ns-slapd
> Trac Ticket #359 - Database RUV could mismatch the one in changelog under the 
> stress
> Ticket #382 - DS Shuts down intermittently
> Ticket #390 - [abrt] 389-ds-base-1.2.10.6-1.fc16: slapi_attr_value_cmp: 
> Process /usr/sbin/ns-slapd was killed by signal 11 (SIGSEGV

I've enabled the core dump stuff, but now I can't seem to get it to crash.  But 
I'm still getting the changelog messages in the error logs whenever I restart.  
In addition, the hub server keeps running out of disk space.  I tracked it down 
to the access log filling up with MOD messages from replication.  It looks like 
changes are coming down from our 1.2.8 servers and being applied over and over 
again.  As an example, one of our entries was modified three times today, and 
on all our other machines I see the following in the access log file:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:11:00:59 -0400] conn=383671 op=3145 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:11:01:24 -0400] conn=383671 op=3153 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:11:01:38 -0400] conn=383671 op=3157 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"

But on the problematic hub server, I see:

# egrep 78b8cc871a3cda9f352580e797b270bc access
[12/Jul/2012:15:17:29 -0400] conn=2 op=58 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=60 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:17:29 -0400] conn=2 op=61 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=169 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=171 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:42 -0400] conn=6 op=172 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=170 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=172 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:45 -0400] conn=3 op=173 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2234 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2236 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:51 -0400] conn=2 op=2237 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2233 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2235 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:55 -0400] conn=6 op=2236 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
[12/Jul/2012:15:24:57 -0400] conn=3 op=2234 MOD 
dn="gtdirguid=78b8cc871a3cda9f352580e797b270bc,ou=accounts,ou=gtaccounts,ou=departments,dc=gted,dc=gatech,dc=edu"
...

I truncated the output for brevity, but there's over 250 MODs to that one 
object.  It's as if the server isn't able to do the replication bookkeeping and 
is accepting changes over and over again.  Eventually the disk fills up.

I just upgraded it to 1.2.10.12 as suggested and just to be safe, I'm doing a 
clean import.  We'll see how it goes.

--
389 users mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/389-users

Re: [389-users] replication from 1.2.8.3 to 1.2.10.4

Reply via email to