Moving database to 'full shutdown' state can corrupt index (at least) on 
long-key text field. Shutdown process can hang w/o returning control to outer 
environment.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: CORE-4914
                 URL: http://tracker.firebirdsql.org/browse/CORE-4914
             Project: Firebird Core
          Issue Type: Bug
          Components: Engine, SVCMGR
            Reporter: Pavel Zotov


I've encountered with lot of errors which can be produced when database is 
moved to shutdown state during intensive DML workload.
It seems that almost all of these errors are caused by presence of long-key 
indexed field, especially when this field is filled with the same text value 
(i.e. 100% duplicates). Errors can happen even on 5-7 attachments when test 
running on machine with slow CPU and/or IO subsystem. All tests was run on 
Windows.

To reproduce:
1)  Download following two files using your svn client:
1.1) 
svn.code.sf.net/p/firebird/code/qa/fbt-repository/trunk/tests/functional/tabloid/batches/dbshut.bat.txt
 
1.2) 
svn.code.sf.net/p/firebird/code/qa/fbt-repository/trunk/tests/functional/tabloid/batches/dbshut.conf
 

2) Rename 'dbshut.bat.tx' to 'dbshut.bat' (i.e. remove last part of its 'long' 
extension).
3) Open 'dbshut.conf' and correct settings related to your FB instance(s) 
(hereafter: 'Cs' = Classic; 'sC' = SuperClassic; 'sS' = SuperServer):
===
fb25Cshome=<path with FB binaries of 2.5 Classic>
fb25Csport =<port which is listening by FB 2.5 Classic instance>

fb25sChome=<path with FB binaries of 2.5 SuperClassic>
fb25sCport=<port which is listening by FB 2.5 SuperClassic instance>

fb25sShome=...
fb25sSport=...

fb30Cshome=...
fb30Csport=...

fb30sChome=...
fb30sCport=...

fb30sShome=...
fb30sSport=...
===

4) Download MTEE utility (Commandline Standard Stream Splitter) from 
http://www.commandline.co.uk/mtee/

Syntax for usage: dbshut.bat <FB> <NN> 2>&1 | mtee /t <log>

where:
  <FB>  is case-insensitive abbrev. of architecture ( SS | SC |CS) concatenated 
with FB version ( 25 | 30 );
  <NN>  is number of opening ISQL sessions;
  <log> is name of log file that should store console output.

Usage samples:

1) test FB 2.5 Classic with opening 20 ISQL sessions:

   dbshut.bat cs25 20 2>&1 | mtee /t dbshut_cs25_att20.log

2) test FB 3.0 SuperClassic with opening 35 ISQL sessions:

   dbshut.bat ss30 35 2>&1 | mtee /t dbshut_sc30_att35.log

3) test FB 3.0 SuperServer with opening 40 ISQL sessions:

   dbshut.bat ss30 40 2>&1 | mtee /t dbshut_ss30_att40.log

This test does following:
===================

1) reads config 'dbshut.conf' and determines:
1.1) required table DDL (parameter 'loading_mode'; can be: tiny | small | 
middle | strong | heavy | ultra; value of this parameter defines ratio of 
index-key to page_size);
1.2) database attributes: page_size and FW;
1.3) how long should all ISQL sessions do their DML work (inserts into table 
with indexed field) - see parameters dml_work_time_*;
2) created temp database with test table;
3) prepares config for trace and launches 'FBSVCMGR action_trace_start ...';
4) starts in separate windows required number of ISQL sessions (their number is 
taken from command-line argument %2 to this batch);
5) forces each ISQL session periodically check until all other ISQLs become 
ready to perform DML. All subsequent DML will be start only when ALL required 
attachments are established - this process not instant and can take several 
seconds.  When all ISQLs are ready to perform DML, this action (INSERT INTO 
...) will be auto-started by them and will be done during dml_work_time_* 
milliseconds.
6) After DML was done during 'dml_work_time_*', batch initiates SHUTDOWN 
process (FBSVCMGR ... prp_shutdown_mode prp_sm_full prp_force_shutdown 0)
7) After control from FBSVCMGR which makes shutdown will return batch will stop 
TRACE session which was launched on step "3)" and wait several seconds until IO 
subsystem will finish flushing trace log on disk.
8) After that batch will parse trace log and try to find there 1st occurence of 
word "shutdown". This step was added at the beginning of this test 
implementation: its purpose was to check that no actions can be done by common 
ISQL attachments  after database will be moved to 'shutdown' state. Currently 
this step can be considered as excessive.
9) Time difference between moments of start and finish shutdown process is 
evaluated and logged. Also, timestamps for each ISQL session when it finishes 
with DML due to db-shutdown are logged.
10) Move database back in ONLINE state and run validation (FBSVCMGR 
action_validate dbname ...)

After this batch will finish, one may see its log created by MTEE (e.g. 
"dbshut_ss30_att10.log" for command like: dbshut.bat ss30 40 2>&1 | mtee /t 
dbshut_ss30_att40.log ).

On WI-V3.0.0.31981 (07-aug-2015) following messages appeared:
================
Page 0 wrong type (expected 7 encountered 1)
page NNN is of wrong type (expected 7, found 32)
(or: "(expected 7, found -67)", "(expected 7, found -68)", "(expected 7, found 
45)")
internal Firebird consistency check (error during savepoint backout (290), 
file: exe.cpp line: 4097)
internal Firebird consistency check (index inconsistent (204), file: btr.cpp 
line: 4987)
Index N has orphan child page at page MMMM in table FIX (129)
Index N is corrupt on page MMMM level 2 at offset KKKK. File: 
..\..\..\src\jrd\validation.cpp, line: 2136
Index N has inconsistent left sibling pointer, page MMMM level 2 in table FIX 
(129)
Index N misses node on page MMMM level 2 in table FIX (129)

On WI-V3.0.0.32008 (23-aug-2015) following messages appeares:
================
Error: Page NNN wrong type (expected data encountered unknown (210))
Error: Data page NNN {sequence MMM} is confused
Warning: Pointer page NNN {sequence 0} bits {0x01 full} are not consistent with 
data page MMM {sequence 111} state {0x00 }
Warning: Index N misses node on page NNN level 2 at offset MMMM
Error: Index N is corrupt on page MMMM level 2 at offset KKKK. File: 
..\..\..\src\jrd\validation.cpp, line: 2055
Warning: Index N misses node on page MMMM level 2 at offset KKKK
Error: Page NNN wrong type (expected index B-tree encountered unknown (203))
Warning: Index N has inconsistent left sibling pointer, page MMMM level 2 at 
offset KKKK
Error: Index N is corrupt on page MMMM level 2 at offset KKKK. File: 
..\..\..\src\jrd\validation.cpp, line: 2258
Error: Page N wrong type (expected data encountered purposely undefined)
Error: Index N is corrupt on page MMMM level 1 at offset KKKK. File: 
..\..\..\src\jrd\validation.cpp, line: 2258
Error: Page N wrong type (expected data encountered purposely undefined)
Warning: Pointer page N {sequence 0} bits {0x00 } are not consistent with data 
page M {sequence 1183} state {0x10 empty}

Also, there were strange hangs of FBSVCMGR where batch could not receive 
control from "prp_shutdown_mode prp_sm_full prp_force_shutdown 0" command. In 
such cases FB service most likely should be restarted because Ctrl-Break does 
not work at all.

PS. As far as I can see after dozens of this test run, most affect on database 
shutdown result have following values:
1) length of indexed field;
2) durability of DML.

PPS. Errors can occur not only on poor hardware. I have several results 
obtained by Simonov Denis which he got on power server also, but with increased 
number of attachments (200...300).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to