Bug#419222: [Pkg-openldap-devel] Bug#419222: it crashed again

2007-05-04 Thread Steve Langasek
On Thu, May 03, 2007 at 08:57:10PM +0200, Gyuris Szabolcs wrote:
 I stopped the slapd, then started and tried to run slapcat:

 bdb_db_open: unclean shutdown detected; attempting recovery.
 bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if
 errors are encountered.
 bdb_db_open: alock_recover failed
 bdb_db_close: alock_close failed
 backend_startup_one: bi_db_open failed! (-1)
 slap_startup failed
 Segmentation fault.

 I have no db_stat :(

That would be db4.2_stat, in the db4.2-util package.

-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
[EMAIL PROTECTED]   http://www.debian.org/


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419222: [Pkg-openldap-devel] Bug#419222: it crashed again - news

2007-05-04 Thread Gyuris Szabolcs
Finally I found some relevant logs:

May  3 19:46:43

conn=63 op=1542 DEL
dn=uid=user_domain1.com,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
conn=63 op=1542 RESULT tag=107 err=0 text=
conn=63 op=1543 DEL
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
bdb(dc=domain,dc=hu): page 3: illegal page type or format
bdb(dc=domain,dc=hu): PANIC: Invalid argument
= bdb_idl_delete_key: c_close failed: DB_RUNRECOVERY: Fatal error, run
database recovery (-30978)
bdb(dc=domain,dc=hu): PANIC: fatal region error detected; run recovery
conn=63 op=1543 RESULT tag=107 err=80 text=entry index delete failed
conn=63 op=1544 DEL
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualWeb,dc=domain,dc=hu
bdb(dc=domain,dc=hu): PANIC: fatal region error detected; run recovery
conn=63 op=1544 RESULT tag=107 err=80 text=internal error
conn=63 op=1545 DEL
dn=domain=domain3.org,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
bdb(dc=domain,dc=hu): PANIC: fatal region error detected; run recovery
conn=63 op=1545 RESULT tag=107 err=80 text=internal error


I think the conn=63 is the connection from slurpd the master ldap
server's replica daemon.

The objects selected to delete existed in the slave ldap database:

May  2 18:25:57

conn=63 op=1310 ADD
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualWeb,dc=domain,dc=hu
conn=63 op=1310 RESULT tag=105 err=0 text=
conn=63 op=1311 ADD
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
conn=63 op=1311 RESULT tag=105 err=0 text=

What was happening here? there were no shutdowns, no crashes, just
bdb(dc=domain,dc=hu): page 3: illegal page type or format


smime.p7s
Description: S/MIME Cryptographic Signature


Bug#419222: [Pkg-openldap-devel] Bug#419222: it crashed again - news

2007-05-04 Thread Quanah Gibson-Mount
--On Friday, May 04, 2007 11:33 AM +0200 Gyuris Szabolcs 
[EMAIL PROTECTED] wrote:



Finally I found some relevant logs:

May  3 19:46:43

conn=63 op=1542 DEL
dn=uid=user_domain1.com,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
conn=63 op=1542 RESULT tag=107 err=0 text=
conn=63 op=1543 DEL
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
bdb(dc=domain,dc=hu): page 3: illegal page type or format
bdb(dc=domain,dc=hu): PANIC: Invalid argument
= bdb_idl_delete_key: c_close failed: DB_RUNRECOVERY: Fatal error, run
database recovery (-30978)
bdb(dc=domain,dc=hu): PANIC: fatal region error detected; run recovery
conn=63 op=1543 RESULT tag=107 err=80 text=entry index delete failed
conn=63 op=1544 DEL
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualWeb,dc=domain,dc=hu
bdb(dc=domain,dc=hu): PANIC: fatal region error detected; run recovery
conn=63 op=1544 RESULT tag=107 err=80 text=internal error
conn=63 op=1545 DEL
dn=domain=domain3.org,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
bdb(dc=domain,dc=hu): PANIC: fatal region error detected; run recovery
conn=63 op=1545 RESULT tag=107 err=80 text=internal error


I think the conn=63 is the connection from slurpd the master ldap
server's replica daemon.

The objects selected to delete existed in the slave ldap database:

May  2 18:25:57

conn=63 op=1310 ADD
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualWeb,dc=domain,dc=hu
conn=63 op=1310 RESULT tag=105 err=0 text=
conn=63 op=1311 ADD
dn=domain=domain2.co.hu,domain=domain1.com,ou=virtualFtp,dc=domain,dc=hu
conn=63 op=1311 RESULT tag=105 err=0 text=

What was happening here? there were no shutdowns, no crashes, just
bdb(dc=domain,dc=hu): page 3: illegal page type or format


Well, that is a new one to me. :/  Google isn't helping much with it 
either, I'll consult upstream and see what I get.


--Quanah

--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc

Zimbra ::  the leader in open source messaging and collaboration


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419222: it crashed again

2007-05-03 Thread Gyuris Szabolcs
I shouted from the rooftops.

The slapd made some big mistake and the nagios plugin said: Could not
search/find objectclasses in dc=domain,dc=tld

I stopped the slapd, then started and tried to run slapcat:

bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if
errors are encountered.
bdb_db_open: alock_recover failed
bdb_db_close: alock_close failed
backend_startup_one: bi_db_open failed! (-1)
slap_startup failed
Segmentation fault.


I have no db_stat :(

# locate db_stat
/usr/share/mysql/mysql-test/r/have_ndb_status_ok.require

I guess that the failure occurs when the master slapd make some update
to the slave... :(


smime.p7s
Description: S/MIME Cryptographic Signature


Bug#419222: [Pkg-openldap-devel] Bug#419222: it crashed again

2007-05-03 Thread Quanah Gibson-Mount



--On May 3, 2007 8:57:10 PM +0200 Gyuris Szabolcs [EMAIL PROTECTED] 
wrote:



I shouted from the rooftops.

The slapd made some big mistake and the nagios plugin said: Could not
search/find objectclasses in dc=domain,dc=tld

I stopped the slapd, then started and tried to run slapcat:

bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if
errors are encountered.


This says that you are running in read-only mode, so it will not try and 
recover the database.  And it appears to believe you have had an unclean 
shutdown.  How was slapd and/or the system last stopped?


--Quanah


--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc

Zimbra ::  the leader in open source messaging and collaboration


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419222: [Pkg-openldap-devel] Bug#419222: it crashed again

2007-05-03 Thread Gyuris Szabolcs
Quanah Gibson-Mount wrote:
 
 
 --On May 3, 2007 8:57:10 PM +0200 Gyuris Szabolcs
 [EMAIL PROTECTED] wrote:
 
 I shouted from the rooftops.

 The slapd made some big mistake and the nagios plugin said: Could not
 search/find objectclasses in dc=domain,dc=tld

 I stopped the slapd, then started and tried to run slapcat:

 bdb_db_open: unclean shutdown detected; attempting recovery.
 bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if
 errors are encountered.
 
 This says that you are running in read-only mode, so it will not try and
 recover the database.  And it appears to believe you have had an unclean
 shutdown.  How was slapd and/or the system last stopped?
It wasn't stopped at all. I think after the master slapd try to make an
update to the slave then something horrible happens and the database
will be corrupt.



-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419222: [Pkg-openldap-devel] Bug#419222: it crashed again

2007-05-03 Thread Quanah Gibson-Mount



--On May 3, 2007 9:53:25 PM +0200 Gyuris Szabolcs [EMAIL PROTECTED] 
wrote:



Quanah Gibson-Mount wrote:



--On May 3, 2007 8:57:10 PM +0200 Gyuris Szabolcs
[EMAIL PROTECTED] wrote:


I shouted from the rooftops.

The slapd made some big mistake and the nagios plugin said: Could not
search/find objectclasses in dc=domain,dc=tld

I stopped the slapd, then started and tried to run slapcat:

bdb_db_open: unclean shutdown detected; attempting recovery.
bdb_db_open: Recovery skipped in read-only mode. Run manual recovery if
errors are encountered.


This says that you are running in read-only mode, so it will not try and
recover the database.  And it appears to believe you have had an unclean
shutdown.  How was slapd and/or the system last stopped?

It wasn't stopped at all. I think after the master slapd try to make an
update to the slave then something horrible happens and the database
will be corrupt.


There's something really odd about your environment.  The error about an 
unclean shutdown only occurs if slapd has been stopped in an unclean 
fashion.  Now you note that you stopped slapd and then ran slapcat.  It 
sounds then like slapd faulted on shutdown rather than shutting down 
cleanly, which means you would have to run db_recover prior to running 
slapcat.  I recall having an issue on slapd shutdown recently, let me see 
if I can go dig that up.


--Quanah


--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc

Zimbra ::  the leader in open source messaging and collaboration


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Bug#419222: [Pkg-openldap-devel] Bug#419222: it crashed again

2007-05-03 Thread Quanah Gibson-Mount



--On May 3, 2007 1:45:42 PM -0700 Quanah Gibson-Mount [EMAIL PROTECTED] 
wrote:



There's something really odd about your environment.  The error about an
unclean shutdown only occurs if slapd has been stopped in an unclean
fashion.  Now you note that you stopped slapd and then ran slapcat.  It
sounds then like slapd faulted on shutdown rather than shutting down
cleanly, which means you would have to run db_recover prior to running
slapcat.  I recall having an issue on slapd shutdown recently, let me see
if I can go dig that up.


Okay, I found it.  ITS#4855 and ITS#4899 deal with slapd crashing on exit. 
The patches are to libldap_r/tpool.c


http://www.openldap.org/devel/cvsweb.cgi/libraries/libldap_r/tpool.c?hideattic=1only_on_branch=OPENLDAP_REL_ENG_2_3



Revisions 1.30.2.18 and 1.3.2.19 respectively deal with this issue, however 
the changes made in 1.30.2.17 need to go in first.  Unfortunately, I'm not 
sure if the changes for ITS#4805 (the 1.30.2.17 commit) are isolated from 
other commits made for that ITS.


The diff for the 3 versions does apply cleanly:

http://www.openldap.org/devel/cvsweb.cgi/libraries/libldap_r/tpool.c.diff?hideattic=1r1=texttr1=1.30.2.16r2=texttr2=1.30.2.19f=u

--Quanah

--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc

Zimbra ::  the leader in open source messaging and collaboration


--
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]