[Firebird-devel] [FB-Tracker] Created: (CORE-4553) Error stack can be broken after several minutes under heavy workload

Pavel Zotov (JIRA) Sun, 14 Sep 2014 03:00:06 -0700

Error stack can be broken after several minutes under heavy workload
--------------------------------------------------------------------


                 Key: CORE-4553
                 URL: http://tracker.firebirdsql.org/browse/CORE-4553
             Project: Firebird Core
          Issue Type: Bug
    Affects Versions: 3.0 Alpha 2
            Reporter: Pavel Zotov


Consider the following DDL:

recreate table ttt(id bigint primary key, x int); commit;
insert into ttt select i, 1 from (select row_number()over() i from rdb$types 
rows 100) order by rand();
commit;
create descending index ttt_id_desc on ttt(id);
commit;

set term ^;

create or alter procedure p_15(a_rnd double precision) returns (dbkey dm_dbkey) 
as
  declare c cursor for (select rdb$db_key dbkey from ttt where id >= :a_rnd 
order by id rows 1 FOR UPDATE WITH LOCK);
begin
  execute statement ('select rdb$db_key dbkey from ttt where id >= :r order by 
id rows 1 FOR UPDATE WITH LOCK')
                    ( r := a_rnd)
                    into dbkey;
  suspend;
end
^
create or alter procedure p_14(a_rnd double precision) returns (dbkey dm_dbkey) 
as
begin
                              execute statement ('select p.dbkey from p_15( :r 
) p')
                    ( r := a_rnd)
                    into dbkey;
  suspend;
end
^

 --- . . . so on downto p_00 . . .

create or alter procedure p_00(a_rnd double precision) returns (dbkey dm_dbkey) 
as
begin
  execute statement ('select p.dbkey from p_01( :r ) p')
                    ( r := a_rnd)
                    into dbkey; suspend;
end
^
set term ;^
commit;

(full text see in attach, file broken_stack_ddl.sql ).

So, we can call any of SPs from: { p_00, p_01, ... p_14 }  - and they all 
ultimately will call p_15 (most deepest) which attempts to lock first row with 
ID >= :some_selected_id in table TTT.

There is huge .sql which is run by ~50...70 attaches and does the follow:

set transaction no wait no auto undo;
set term ^;
execute block returns(dts char(12), id bigint, elapsed_ms int) as
  declare id_min double precision;
  declare id_max double precision;
  declare id_rnd double precision;
  declare dbkey char(8) character set octets;
  declare t0 timestamp;
begin
  select min(id)-0.5 from ttt into id_min;
  select max(id)+0.5 from ttt into id_max;
  id_rnd = id_min+rand()*(id_max-id_min);
  t0='now';
  select dbkey from p_03( :id_rnd ) into dbkey; -- ### NB ###

  update ttt t set t.x = rand()*1000 -- (select count(*) from 
rdb$types,rdb$types)
  where t.rdb$db_key = :dbkey
  returning id into id;

  dts=substring(cast(cast('now' as timestamp) as varchar(25)) from 12 for 12);
  elapsed_ms = datediff(millisecond from t0 to cast('now' as timestamp));
  suspend;
end
^
set term ;^
commit;

-- and this EB repeats many times (full test see in attach, file 
broken_stack_run.sql ).

We can change the name of called SP from p_01 upto p_14.
But error can be reproduced only if  'p_00' . . . 'p_03' is specified in EB (at 
least in my env.).

Batch file to launch isqls:

setlocal enabledelayedexpansion enableextensions
set winq=%1
for /l %%i in (1, 1, %winq%) do (
  set str=isql 192.168.0.220/3333:oltp30 -n -i broken_stack_run.sql 1^>nul 
2^>nul
  start /min cmd /c !str!
)

If we run this batch like this: broken_stack_run.bat  70 - it will launch 70 
isql windows.

Trace config:

enabled = true
log_errors = true
time_threshold = 0
print_plan = true
explain_plan = true
print_perf = true
max_sql_length = 16384
max_log_size = 9999999999

After run trace and batch:  broken_stack_run.bat  70  - we need to wait a few 
minutes.

Then we can periodically check log of trace like this (file 
`get-broken-lines.sh` in attach) :

grep " : " $1  | grep -v "deadlock\|concurrent\|At" | head -20

(where $1 - the name of trace log).

The following 'broken' messages will be displayed after some minutes:

335544842 : 1
335544842 : 7
335544842 : 5
335544842 : 7
335544842 : 5
335544842 : 5
335544842 : t procedure 'P_09' line: 4, col: 21
335544842 :  procedure 'P_04' line: 4, col: 11
335544842 :  procedure 'P_04' line: 4, col: 11
335544842 : procedure 'P_05' line: 4, col: 13
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : procedure 'P_11' line: 4, col: 25
335544842 : 5
335544842 : 5
335544842 : procedure 'P_03' line: 4, col: 9
335544842 : t procedure 'P_08' line: 4, col: 19
335544842 : t procedure 'P_08' line: 4, col: 19
335544842 : t procedure 'P_05' line: 4, col: 13
335544842 : t procedure 'P_08' line: 4, col: 19

The full test case please see in attach.
Correction in `broken_stack_run.bat` is required for host name + port + 
database alias.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

[Firebird-devel] [FB-Tracker] Created: (CORE-4553) Error stack can be broken after several minutes under heavy workload

Reply via email to