This has been filed upstream as ITS#5362

--Quanah

--On February 4, 2008 8:33:30 PM +0100 Ralph Rößner <[EMAIL PROTECTED]> wrote:

Package: slapd
Version: 2.4.7-3
Severity: Important

Hi,

when our syncrepl consumers (refreshOnly mode) query the provider for
changes, the provider will sometimes send back an intermediate message
that has the syncronization cookie truncated (the csn is missing). This
causes the consumer to die (segfault). Upon restart, the consumer
database will be empty. In a rarer case, the consumer will survive but
have its database cleaned out as well. This problem appeared after the
upgrade from 2.3.83-1+lenny1.

Our LDAP infrastructure contains a syncrepl provider and three consumers
in refreshOnly mode. Two of the consumers get an identical subset of the
data and are configured alike except for the replication user, while the
third serves a different purpose. All consumers have been hit by the
problem, the ones configured alike die at the same time. The problem
appears at apparently random intervals, from a few hours to a few days.

Since then I have tried a few changes to our configuration and an
upgrade to 2.4.7-4, mainly to keep things alive (mail customers not
being happy). This has yielded only one result, namely that switching to
refreshAndPersist mode avoids the problem, I had one of the alike
configured consumers running in refreshAndPersist, and it survived when
the other failed.

I have set up a test consumer server, copying the existing
configuration, and it has nicely duplicated the problem, even
reproducably for a stretch of time, So I am able to provide sane (i.e.
without a lot of queries for mail adresses) debug logs that show the
consumer failing. I have also captured a debug log of the provider
working at the replication query, from a later point in time since
restarting the provider to change the log level has cleared the problem
for a while.

You will notice in the logs that the intermediate message returned to
the client contains a cookie that stops after the "csn=" string, i.e. it
does not actually contain a value for the csn. I think that is what
kills the consumer. I don't have a clue why the provider does that.

I have provided a network trace (in pcap format) of the exchange,
leaving out the handshake and bind request message to avoid password
disclosure. Unless I'm mistaken, the refreshDeletes flag of the
intermediate message is set to TRUE, indicating multiple deletes
(right?). This fits well with the rare case of the consumer deleting all
its entries (which I have not been able to get logs of so far).

From the usual use of our provider server I would have expected zero or
one changes within the poll interval, and definitely no deleted objects.
So the fact that the provider is trying to send a sync id set at all
and flag it as deletes looks suspicious to me. The test consumer server
has never logged such an intermediate message as reaction to a
synchronization search except in these fatal cases, for the few days
that it has been running debug enabled now.

Now I hope that someone has an idea about what might be going wrong in
the provider server. I can just speculate that the problems we observe
are symptoms of a deeper problem.

Some software versions:

slapd: 2.4.7-3
libc6: 2.7-6
libdb4.2: 4.2.52+dfsg-4
libgnutls13: 2.0.4-1
libiodbc2: 3.52.6-1
libldap-2.4-2: 2.4.7-3

Attached files:

slapd.conf.keldon - provider configuration file
slapd.conf.gorkon - test consumer configuration file
slapd.crash.capture - network trace of the consumer - provider
                    communication while performing the deadly replication
slapd.crash.strace - syscall trace of the consumer at the same time
slapd.crash.log -   debug log of the consumer, levels
                    sync+stats+acl+trace, at the same time
provider.log -      debug log of the provider, levels sync+stats+trace,
                    at a later deadly replication

Sincerely,
   Ralph Rößner

--
Ralph Rößner
CAPCom AG < http://www.capcom.de >
Rundeturmstr. 10, 64283 Darmstadt, Germany
Phone +49 6151 155 900, Fax +49 6151 155 909

Vorstand: Luc Neumann (Vorsitzender)
Vorsitzender des Aufsichtsrats: Prof. Dr.-Ing. José L. Encarnação
Sitz der Gesellschaft: Darmstadt, Registergericht: Darmstadt HRB 8090



--

Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra ::  the leader in open source messaging and collaboration


Reply via email to