Hi,

last monday we've tried to update a database which is mainly used by our
Java-applications from SAP DB version 7.4 (ASCII) to MaxDB Version 7.6.0.36
(UNICODE) and then 7.6.1.15. We use the same binaries/installation procedure
as for our SAP systems. Despite the fact this database is not used by any of
our SAP systems the bug could threat them too.

Here are the message from knldiag and knldiag.err:

knldiag:
---8<---
[...]
2007-05-28 13:13:15 30386     12929 TASKING  Task T142 started
2007-05-28 13:13:15 30386     11007 COMMUNIC wait for connection T142
2007-05-28 13:13:15 30386     11561 COMMUNIC Connected  T142
japs02.saarstahl.de 0
2007-05-28 13:13:15 30386     11560 COMMUNIC Releasing  T142
2007-05-28 13:13:15 30386     12827 COMMUNIC wait for connection T142
2007-05-28 13:13:15 30370     11561 COMMUNIC Connecting T177
japs02.saarstahl.de 0
2007-05-28 13:13:15 30386     12929 TASKING  Task T177 started
2007-05-28 13:13:15 30386     11007 COMMUNIC wait for connection T177
2007-05-28 13:13:15 30386     11561 COMMUNIC Connected  T177
japs02.saarstahl.de 0
2007-05-28 13:13:15 30386     11560 COMMUNIC Releasing  T177
2007-05-28 13:13:15 30386     12827 COMMUNIC wait for connection T177
+++++++++++++++++++++++++++++++++++++++ Kernel Exit
++++++++++++++++++++++++++++
2007-05-28 13:13:51     0     11987 dump_rte rtedump written to file
'rtedump'
2007-05-28 13:13:51     0 ERR 12005 DBCRASH  Kernel exited with core and
exit status 0x8b
2007-05-28 13:13:51     0 ERR 12012 DBCRASH  No stack backtrace since signal
handler was suppressed by SUPPRESS_CORE=NO
2007-05-28 13:13:51     0 ERR 12009 DBCRASH  Kernel exited due to signal
11(SIGSEGV)
2007-05-28 13:13:51     0     12808 DBSTATE  Flushing knltrace pages
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T113 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T121 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T134 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T147 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T153 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T184 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T213 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T242 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T273 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T278 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T289 kernel abort
2007-05-28 13:13:52     0 WNG 11824 COMMUNIC Releasing  T295 kernel abort
2007-05-28 13:13:52     0     12696 DBSTATE  Change DbState to 'OFFLINE
'(29)
---8<---

knldiag.err:
---8<---
[...]
2007-05-28 13:13:51     0 ERR 12005 DBCRASH  Kernel exited with core and
exit status 0x8b
2007-05-28 13:13:51     0 ERR 12012 DBCRASH  No stack backtrace since signal
handler was suppressed by SUPPRESS_CORE=NO
2007-05-28 13:13:51     0 ERR 12009 DBCRASH  Kernel exited due to signal
11(SIGSEGV)
2007-05-28 13:13:52                          ___ Stopping GMT 2007-05-28
11:13:52           7.6.01   Build 015-121-147-649 
---8<---

A stack backtrace with gdb showed this:

---8<---
(gdb) bt
#0  0x0000000000823675 in a93swap_from_application ()
#1  0x0000000000823cba in ak93vreceive ()
#2  0x0000000000823f29 in a93_user_commands ()
#3  0x0000000000d8ae04 in SQLTask ()
#4  0x0000000000d8afe3 in Kernel_Main ()
#5  0x0000000000e13a48 in RTETask_TaskMain ()
#6  0x0000000000ec119c in en88_CallKernelTaskMain ()
#7  0x00002b0b5740fe70 in __correctly_grouped_prefixwc () from
/lib64/libc.so.6
#8  0x0000000000000000 in ?? ()
---8<---

---8<---
(gdb) info registers
rax            0x4c535f50       1280532304
rbx            0x142bfc0        21151680
rcx            0x2      2
rdx            0x4c535f5e       1280532318
rsi            0x3      3
rdi            0x4c535f50       1280532304
rbp            0x2aacb201ce5f   0x2aacb201ce5f
rsp            0x2aacb201cd60   0x2aacb201cd60
r8             0x2aacb201cd54   46921209204052
r9             0x2aacb63d4fdc   46921280212956
r10            0x2aaaaac2bc08   46912497695752
r11            0x212    530
r12            0x2aad0290af30   46922560745264
r13            0x2aacb201d320   46921209205536
r14            0x2aacb201cede   46921209204446
r15            0x2aacb63d4f88   46921280212872
rip            0x823675 0x823675 <a93swap_from_application+453>
eflags         0x10203  [ CF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x63     99
gs             0x0      0
---8<---

A quick look in the source shows, that the function which crashed here is a
very central one.

We first migrated to 7.6.0.36 (normal installation, import of data via
loadercli). The database was running for approximatley 5 hours then the
kernel crashed the first time (same backtrace btw.). Afterwards we updated
to version 7.6.1.15 but the databases crashed again multiple times. 

The database runs on a 8 way Intel Core2 machine with 32GB RAM and SuSE
SLES10 x86_64 (Linux kernel 2.6.16).

Now I try to provoke the error; could you give me a hint? Maybe the request
is too big for an internal buffer or the like?

Thanks a lot for your help!


bye
Chris

phone: +49 6898/10-4987
fax: +49 6898/10-54987
http://www.saarstahl.de

-- 
MaxDB Discussion Mailing List
For list archives: http://lists.mysql.com/maxdb
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to