Date: Tue, 27 Nov 2007 15:27:47 +0100

Hi all,

Our clients live server (Redhat) experienced some problems this =20
morning, and as a result they had to restart it. But apparently ldap =20
wasn't able to do everything it wanted before it got killed, because =20
during bootup it said:

Checking configuration files for : bdb_db_open: unclean shutdown =20
detected; attem
pting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery =20
if errors a
re encountered.

Seems like a custom install; that message comes from OpenLDAP 2.3+ and as far as I know RedHat has only bundled OpenLDAP 2.2 and older so far. (Of course it's possible that they've updated recently, I dunno.) But since you haven't given specific version numbers of either RedHat or the LDAP software that's just a guess.

Then the server just halted. So no prompt, and no ssh or anything. =20
They had to reboot in single user mode and then disable ldap =20
(chkconfig --level 235 ldap off).

Then the reboot worked as it should, and I could connect over ssh. I =20
tried db_recover (as ldap user, in the correct folder) but I got the =20

db_recover: DB_LOGC->get: log record LSN 122/68320410: checksum mismatch
db_recover: DB_LOGC->get: catastrophic recovery may be required
db_recover: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
db_recover: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
db_recover: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
db_recover: DB_ENV->open: DB_RUNRECOVERY: Fatal error, run database recovery

I then tried db_recover -c  but that gave the same result. It seemed =20
like the transaction log file 122 was corrupt, so I tried moving all =20
log files up to and including 122 to a separate folder and then ran =20
db_recover (with parameters -cv) but got:

1. How come db_recover couldn't fix this problem? It said I should run =20
a catastrophic recovery, but I was doing that! Very strange...

Most likely you were using a version of db_recover that didn't match the version of BerkeleyDB that OpenLDAP was using.

2. If ldap now is working, and we have a full backup of the bdb =20
database files, do I need any of the old transaction files (including =20
the currupt one, 122)?

I don't think those log files are of any use to you now. By starting the server without a successful db_recover, you've made it generate a new sequence of log files and your old log sequence numbers are probably now invalid anyway.

3. How come this "unclean shutdown detected" error can make the hole =20
system halt during startup? How can I make it work again?

Sounds like a bug in your startup script. Most likely it's invoking slaptest and getting an error there, and then giving up instead of starting slapd at that point. In My Opinion using slaptest to verify your configuration syntax in the startup script is stupid, and far too late to be checking. You should just delete/comment that out and have it run slapd unconditionally. You should be checking the configuration integrity right after making a change, not during system startup. Normally slapd (in OpenLDAP 2.3+) will perform whatever recovery is necessary automatically. The "run manual recovery" message above is deceptive, but slapd itself will never output that message. (But slaptest or slapcat may.) So once again, there's something running in your startup script before the actual slapd invocation that's causing this failure, and it quite plainly doesn't belong in the script.

4. The main problem seems to have been that ldap wasn't able to write =20
all its file changes before it got killed during the shutdown/reboot. =20
What can I do to make this less likely to happen (or stop it from =20
happening completely)? I read somewere about "committing the logs more =20
frequently", but how?

As already mentioned, use the checkpoint directive.

In the meantime, read the BerkeleyDB documentation.

--    -- Howard Chu
   Chief Architect, Symas Corp.
   Director, Highland Sun
   Chief Architect, OpenLDAP

You are currently subscribed to [EMAIL PROTECTED] as: [EMAIL PROTECTED]
To unsubscribe send email to [EMAIL PROTECTED] with the word UNSUBSCRIBE as the 
SUBJECT of the message.

Reply via email to