Error stack can be broken after several minutes under heavy workload
--------------------------------------------------------------------
Key: CORE-4553
URL: http://tracker.firebirdsql.org/browse/CORE-4553
Project: Firebird Core
Issue Type: Bug
Affects Versions: 3.0 Alpha 2
Reporter: Pavel Zotov
Consider the following DDL:
recreate table ttt(id bigint primary key, x int); commit;
insert into ttt select i, 1 from (select row_number()over() i from rdb$types
rows 100) order by rand();
commit;
create descending index ttt_id_desc on ttt(id);
commit;
set term ^;
create or alter procedure p_15(a_rnd double precision) returns (dbkey dm_dbkey)
as
declare c cursor for (select rdb$db_key dbkey from ttt where id >= :a_rnd
order by id rows 1 FOR UPDATE WITH LOCK);
begin
execute statement ('select rdb$db_key dbkey from ttt where id >= :r order by
id rows 1 FOR UPDATE WITH LOCK')
( r := a_rnd)
into dbkey;
suspend;
end
^
create or alter procedure p_14(a_rnd double precision) returns (dbkey dm_dbkey)
as
begin
execute statement ('select p.dbkey from p_15( :r
) p')
( r := a_rnd)
into dbkey;
suspend;
end
^
--- . . . so on downto p_00 . . .
create or alter procedure p_00(a_rnd double precision) returns (dbkey dm_dbkey)
as
begin
execute statement ('select p.dbkey from p_01( :r ) p')
( r := a_rnd)
into dbkey; suspend;
end
^
set term ;^
commit;
(full text see in attach, file broken_stack_ddl.sql ).
So, we can call any of SPs from: { p_00, p_01, ... p_14 } - and they all
ultimately will call p_15 (most deepest) which attempts to lock first row with
ID >= :some_selected_id in table TTT.
There is huge .sql which is run by ~50...70 attaches and does the follow:
set transaction no wait no auto undo;
set term ^;
execute block returns(dts char(12), id bigint, elapsed_ms int) as
declare id_min double precision;
declare id_max double precision;
declare id_rnd double precision;
declare dbkey char(8) character set octets;
declare t0 timestamp;
begin
select min(id)-0.5 from ttt into id_min;
select max(id)+0.5 from ttt into id_max;
id_rnd = id_min+rand()*(id_max-id_min);
t0='now';
select dbkey from p_03( :id_rnd ) into dbkey; -- ### NB ###
update ttt t set t.x = rand()*1000 -- (select count(*) from
rdb$types,rdb$types)
where t.rdb$db_key = :dbkey
returning id into id;
dts=substring(cast(cast('now' as timestamp) as varchar(25)) from 12 for 12);
elapsed_ms = datediff(millisecond from t0 to cast('now' as timestamp));
suspend;
end
^
set term ;^
commit;
-- and this EB repeats many times (full test see in attach, file
broken_stack_run.sql ).
We can change the name of called SP from p_01 upto p_14.
But error can be reproduced only if 'p_00' . . . 'p_03' is specified in EB (at
least in my env.).
Batch file to launch isqls:
setlocal enabledelayedexpansion enableextensions
set winq=%1
for /l %%i in (1, 1, %winq%) do (
set str=isql 192.168.0.220/3333:oltp30 -n -i broken_stack_run.sql 1^>nul
2^>nul
start /min cmd /c !str!
)
If we run this batch like this: broken_stack_run.bat 70 - it will launch 70
isql windows.
Trace config:
enabled = true
log_errors = true
time_threshold = 0
print_plan = true
explain_plan = true
print_perf = true
max_sql_length = 16384
max_log_size = 9999999999
After run trace and batch: broken_stack_run.bat 70 - we need to wait a few
minutes.
Then we can periodically check log of trace like this (file
`get-broken-lines.sh` in attach) :
grep " : " $1 | grep -v "deadlock\|concurrent\|At" | head -20
(where $1 - the name of trace log).
The following 'broken' messages will be displayed after some minutes:
335544842 : 1
335544842 : 7
335544842 : 5
335544842 : 7
335544842 : 5
335544842 : 5
335544842 : t procedure 'P_09' line: 4, col: 21
335544842 : procedure 'P_04' line: 4, col: 11
335544842 : procedure 'P_04' line: 4, col: 11
335544842 : procedure 'P_05' line: 4, col: 13
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : 5
335544842 : 5
335544842 : procedure 'P_03' line: 4, col: 9
335544842 : t procedure 'P_08' line: 4, col: 19
335544842 : t procedure 'P_08' line: 4, col: 19
335544842 : t procedure 'P_05' line: 4, col: 13
335544842 : t procedure 'P_08' line: 4, col: 19
The full test case please see in attach.
Correction in `broken_stack_run.bat` is required for host name + port +
database alias.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at
https://lists.sourceforge.net/lists/listinfo/firebird-devel