[389-users] Problem starting and replicating RHDS9

Ric Tue, 01 Oct 2013 05:20:09 -0700

Hello All,

I hope you can forgive a request which I am sure doesn't have enough
information in it, please let me know what else I can add if you might
be able to help.


I have a problem with our installation of RHDS9 and practically
nothing in the logs to suggest where to look.

We have a multi-master pair, with DNS round robin to load balance.
Due to the problem I have updated DNS to point all traffic to the
working server so I hope I can get this working again without
impacting the users. But while I don't know the reason I'm concerned
it may occur on the working server and prevent all logins. :(

We first noticed that replication was not working, now it seems that I
can't get slapd to start on one of the pair.
Have restarted both dirsrv and both servers.

There is woefully little in the log files, but if there is a way to
increase logging levels I haven't found it yet. If there is, please
advise and I'll do that and post.

This is the info I have gathered so far. Please let me know what else
might help.


/usr/sbin/ns-slapd -v
389 Project
389-Directory/1.2.11.15 B2013.211.1952

dirsrv dir01 is stopped
There is no:
/var/run/dirsrv/slapd-dir01.pid

# service dirsrv start
  *** Error: 1 instance(s) failed to start

The start-up runs the wait loop and finally exists, with the message above.
errors log includes the message:

[01/Oct/2013:12:14:47 +0100] - 389-Directory/1.2.11.15 B2013.211.1952
starting up
[01/Oct/2013:12:14:47 +0100] - WARNING: userRoot: entry cache size
10485760B is less than db size 10739712B; We recommend to increase the
entry cache size nsslapd-cachememsize.


The start-up process leaves one slapd running:
# ps -ef |grep slapd
dsuser   12560     1  0 09:51 ?        00:00:03 /usr/sbin/ns-slapd -D
/etc/dirsrv/slapd-dir01 -i /var/run/dirsrv/slapd-dir01.pid -w
/var/run/dirsrv/slapd-dir01.startpid

but no working ns-slapd.

I recognise that we need to tune the cache, but don't believe that it
will cause the start-up failure, just a performance hit. To tune via
the console I suspect I have to get it running first!
The working server shows the same error, along with:

[01/Oct/2013:12:16:26 +0100] slapi_ldap_bind - Error: could not send
bind request for id [cn=repman,cn=config] mech [SIMPLE]: error -1
(Can't contact LDAP server) 0 (unknown) 107 (Transport endpoint is not
connected)

Which makes sense.

The logs errors and access provide no other content at all, so nothing
to indicate what is failing.

Any ideas where I might start will be greatly welcomed.

Many thanks, Ric.
--
389 users mailing list
389-users@lists.fedoraproject.org
https://admin.fedoraproject.org/mailman/listinfo/389-users

[389-users] Problem starting and replicating RHDS9

Reply via email to