Online validation can lead to data inconsistency during intensive DML
---------------------------------------------------------------------

                 Key: CORE-4973
                 URL: http://tracker.firebirdsql.org/browse/CORE-4973
             Project: Firebird Core
          Issue Type: Bug
            Reporter: Pavel Zotov
         Attachments: 
bulk-records-exchange-with-online-validation-data-consistency-test.zip

Consider following scenario (something similar to bank transactions when bulk 
of funds are moving from sources to targets and vice versa):

1. Let we have sequence (name = 'g') and two tables, QA & QB which both can act 
like source and target for "moving" their rows.
2. DDL of these tables is the same: (x int, y int, w int), and also they have 
compound indices on key (w, x, y).
3. Tables QA & QB are filled with 50'000 records, but value of field 'w' 
changes in limit [0, 399].
4. Value for writing in the field 'w' is calculated during initial data filling 
process by formula: mod( gen_id(g,1), 400).
   After 50'000 rows will be inserted into each of tables QA|QB, following 
queries:

   SELECT W, COUNT(*) as CNT FROM QA
   UNION ALL
   SELECT W, COUNT(*) as CNT FROM QB

   -- will return two rows with the same value of CNT = 125.

5. When some ISQL will start job by moving dozen of records from QA to QB (or 
vice versa), total number of rows with the same
   value of 'W' in tables QA & QB should ALWAYS be equal 125 + 125 = 250. This 
value (250) should remain constant regardless 
   of number ISQL sessions and their concurrent conflicts. 
   And of course, this value should not be changed by any actions that can 
occur against database - e.g. online validation.

Some time ago I've found strange effect: when online validation is launched 
during several (even single!) ISQL doing work on 'moving' records
data in the tables QA and QB can become inconsistent. The rule about total sum 
of counts = 125 + 125 = 250  will be broken.

Following is about test implementation.

1. DDL-script for creating test database and initial data filling (see also in 
attached .zip, file "exch-ddl.sql"):
===
    create or alter view v_check_for_mismatch as select 1 id from rdb$database;

    recreate exception exc_strange_gds 'Got strange gdscode = @1.';
    recreate exception exc_mism_detected 'Mismatch detected at least for 
id=@1.';

    set term ^;
    execute block as
    begin
      execute statement 'drop sequence g';
      when any do begin end
    end
    ^ set term ;^

    commit;

    create sequence g;

    recreate table stoptest(id int, qa_cnt int, qb_cnt int);

    recreate table qa(x int, y int, w int);
    recreate table qb(x int, y int, w int);
    recreate table exc_log(fbgds int);

    insert into qa(x, y, w)
    select rand()*5, rand()*5, mod( gen_id(g,1), 400)
    from rdb$types,rdb$types
    rows 50000;

    insert into qb(x, y, w)
    select rand()*5, rand()*5, mod( gen_id(g,1), 400)
    from rdb$types,rdb$types
    rows 50000;

    commit;

    create index qa_xy on qa(w, x, y);
    create index qb_xy on qb(w, x, y);
    commit;

    recreate global temporary table gtt_job(w int primary key using index 
gtt_job_pk, c int);
    commit;

    create or alter view v_check_for_mismatch as
    select w, max(qa_cnt) qa_cnt, max(qb_cnt) qb_cnt
    from (
        select 'qa' src, w, count(*) qa_cnt, 0 as qb_cnt
        from qa
        group by w
        union all
        select 'qb' src, w, 0, count(*) qb_cnt
        from qb
        group by w
    )
    group by w
    having max(qa_cnt) + max(qb_cnt) is distinct from 250;
===

2. This script is what each ISQL should perform (see also file "exch-dml.sql" 
in attach):
===
    set list on;
    set bail on;
    commit;

    set transaction snapshot no wait;
    select
        current_connection
       ,current_transaction
       ,'Start job at ' || current_timestamp as msg 
    from rdb$database;

    set term ^;
    execute block
    as
        declare m int = 50;
        declare i int;
        declare a_w int;
        declare v_c int;
        declare v_w int;
        declare v_x int;
        declare v_y int;
        declare c_qa cursor for (select w, x, y from qa where w = :a_w);
        declare c_qb cursor for (select w, x, y from qb where w = :a_w);
        declare v_qa_inserted int = 0;
        declare v_qb_inserted int = 0;
        declare v_qa_deleted int = 0;
        declare v_qb_deleted int = 0;
    begin

        rdb$set_context('USER_SESSION', 'QA_INSERTED', null);
        rdb$set_context('USER_SESSION', 'QB_INSERTED', null);
        rdb$set_context('USER_SESSION', 'QA_DELETED',  null);
        rdb$set_context('USER_SESSION', 'QB_DELETED' , null);

        if ( exists( select * from stoptest ) ) 
        then
            exception exc_mism_detected using ( (select id from stoptest rows 
1) );

        delete from gtt_job;
        i = m;
        while (i > 0) do
        begin
          merge into gtt_job t
          using (select cast(rand()*400 as int) w, cast(rand()*20 as int) c 
from rdb$database union all select 1,1 from rdb$database where 1=0) s
          on s.w = t.w
          when not matched then insert values(s.w, s.c)
          when matched then update set t.c = t.c + s.c
          ;
          i = i -1;
        end
        
        for select w, c from gtt_job into a_w, v_c
        do begin
          open c_qa;
          while (v_c > 0) do
          begin

            begin
              fetch c_qa into v_w, v_x, v_y;
              if (row_count = 0) then leave;
              delete from qa where current of c_qa;
              insert into qb(w, x, y) values(:v_w, :v_x, :v_y);

              v_qa_deleted = v_qa_deleted + 1;
              v_qb_inserted = v_qb_inserted + 1;

            when any do
              begin
                -- lock_conflict, concurrent_transaction, deadlock, 
update_conflict
                if ( gdscode NOT in (335544345, 335544878, 335544336,335544451 
) )
                then
                    begin
                        in autonomous transaction do insert into exc_log(fbgds) 
values(gdscode);
                        exception exc_strange_gds using (gdscode);
                    end
              end
            end
     
            v_c = v_c - 1;

          end
          close c_qa;
        end
        
        delete from gtt_job;
        i = m;
        while (i > 0) do
        begin
          merge into gtt_job t
          using (select cast(rand()*400 as int) w, cast(rand()*20 as int) c 
from rdb$database union all select 1,1 from rdb$database where 1=0) s
          on s.w = t.w
          when not matched then insert values(s.w, s.c)
          when matched then update set t.c = t.c + s.c
          ;
          i = i -1;
        end
        
        for select w, c from gtt_job into a_w, v_c
        do begin
          open c_qb;
          while (v_c > 0) do
          begin
            begin
              fetch c_qb into v_w, v_x, v_y;
              if (row_count = 0) then leave;
              delete from qb where current of c_qb;
              insert into qa(w, x, y) values(:v_w, :v_x, :v_y);

              v_qb_deleted = v_qb_deleted + 1;
              v_qa_inserted = v_qa_inserted + 1;

            when any do
              begin
                -- lock_conflict, concurrent_transaction, deadlock, 
update_conflict
                if ( gdscode NOT in (335544345, 335544878, 335544336,335544451 
) )
                then
                    begin
                        in autonomous transaction do insert into exc_log(fbgds) 
values(gdscode);
                        exception exc_strange_gds using (gdscode);
                    end
              end
            end

            v_c = v_c -1;

          end
          close c_qb;
        end
      
        rdb$set_context('USER_SESSION', 'QA_INSERTED', v_qa_inserted);
        rdb$set_context('USER_SESSION', 'QB_INSERTED', v_qb_inserted);
        rdb$set_context('USER_SESSION', 'QA_DELETED',  v_qa_deleted);
        rdb$set_context('USER_SESSION', 'QB_DELETED' , v_qb_deleted);

    end
    ^
    set term ;^

    select 
         current_connection
        ,current_transaction
        ,'Finish job at ' || current_timestamp as msg 
        ,rdb$get_context('USER_SESSION', 'QA_INSERTED') as qa_inserted
        ,rdb$get_context('USER_SESSION', 'QB_INSERTED') as qb_inserted
        ,rdb$get_context('USER_SESSION', 'QA_DELETED') as qa_deleted
        ,rdb$get_context('USER_SESSION', 'QB_DELETED') as qb_deleted
        ,iif( rdb$get_context('USER_SESSION', 'QA_INSERTED') is distinct from 
rdb$get_context('USER_SESSION', 'QB_DELETED')
              or 
              rdb$get_context('USER_SESSION', 'QB_INSERTED') is distinct from 
rdb$get_context('USER_SESSION', 'QA_DELETED')
             ,'### ACHTUNG ###'
             ,'Ok.'
            ) as check_result
    from rdb$database;

    commit;

    set bail off;
===

3. This is main batch file ("exch-run.bat" in attached .zip):
===
    @echo off
    setlocal enabledelayedexpansion enableextensions

    set winq=%1
    if .%winq%.==.. set winq=30

    md logs 2>nul

    set host=localhost
    set port=3333
    set dbnm=%~dp0tmp.fdb
    set usr=SYSDBA
    set pwd=masterkey
    set fbc=C:\MIX\firebird\fb30
    set stopfile=logs\1mismatches.txt
    set delay=5

    del %dbnm% 2>nul
    del %stopfile% 2>nul
    del logs\*.log 2>nul
    del logs\*.err 2>nul

    echo create database '%host%/%port%:%dbnm%' user '%usr%' password '%pwd%'; 
| %fbc%\isql -q
    %fbc%\gfix -w async %host%/%port%:%dbnm%

    echo on
    echo in %~dp0exch-ddl.sql; | %fbc%\isql %host%/%port%:%dbnm% -q -user %usr% 
-password %pwd%
    @echo off

    @rem ###############
    set run_online_val=1
    @rem ###############

    echo.
    echo Test database is ready. Validation setting: %run_online_val%
    echo. 
    echo Press any key to launch %winq% ISQL sessions. . .
    echo.
    pause > nul


    for /l %%i in (0, 1, !winq!) do (
      if .%%i.==.0. (
        if .%run_online_val%.==.1. (
          start /min cmd /c exch-val.bat %host% %port% %dbnm% %usr% %pwd% %fbc% 
%stopfile% %delay%
        )
      ) else (
        set /a k=1000+%%i
        set k=!k:~1,3!
        set log=logs\%~n0-!k!.log
        set err=logs\%~n0-!k!.err
        del !log! 2>nul
        del !err! 2>nul
        set cmd_sql=exch-dml.bat %host% %port% %dbnm% %usr% %pwd% %stopfile% ^| 
%fbc%\isql -q 1^>!log! 2^>!err!
        start /min cmd /c !cmd_sql!

      )
    )

    for /l %%i in (1,1,1) do (
      set log=logs\%~n0-chk.log
      set err=logs\%~n0-chk.err
      del !log! 2>nul
      del !err! 2>nul
      set cmd_sql=exch-chk.bat %host% %port% %dbnm% %usr% %pwd% %stopfile% 
%delay% !log! ^| isql -q -pag 9999 1^>!log! 2^>!err!
      echo !cmd_sql!
      start /min cmd /c !cmd_sql!

    )

    echo Main batch has done its work, now wait for mismatches. Bye-bye from 
%~f0.
    exit
===

Change setting in this batch to yours, namely: host, port, dbnm, usr, pwd and 
fbc (path to FB binaries on client machine).

Starting part of this batch is:
    ...
    for /l %%i in (0, 1, !winq!) do (
     ...
    )
    ...

This part will launch in separate windows two auxiliary batches (they also are 
in attached .zip):
a) batch for online validation will be run ('exch-val.bat') - but only if 
setting %run_online_val% = 1, see below about it.
b) batch for each ISQL session will be launched in order to make bulk records 
movements ('exch-dml.bat')


Final part of main batch ("exch-run.bat") is:
    ...
      set cmd_sql=exch-chk.bat %host% %port% %dbnm% %usr% %pwd% %stopfile% 
%delay% !log! ^| isql -q -pag 9999 1^>!log! 2^>!err!
      echo !cmd_sql!
      start /min cmd /c !cmd_sql!
    ...

This part will launch batch which make check of data consistency in the both 
tables (QA & QB).
It does such check bu querying special VIEW ("v_check_for_mismatch") and 
inserting its result into 'stop'-table:
    ...
    out %stopfile%;
    insert into stoptest(id, qa_cnt, qb_cnt)
    select w, qa_cnt, qb_cnt
    from v_check_for_mismatch;

    set list off;
    select x.*
    from (
        select s.*, 250 as expected_sum, s.qa_cnt + s.qb_cnt actual_sum
        from stoptest s
    ^) x
    order by abs( x.expected_sum - x.actual_sum ^) desc, id;
    set list on;
    commit;
    out;
    ...

When at least one mismatch will be found, table 'stoptest' will become 
non-empty. 
Every ISQL session checks this table on every iteration. When at least one 
record appear in 'stoptest', ISQL session will rase exception (with name 
"exc_mism_detected") and will terminate itself.


Now pay attention on this command:

@rem ###############
set run_online_val=0
@rem ###############

When this variable is set to 0, you can launch as much ISQL sessions as you 
want, and all of them will work infinite time.

And when this variable is changed to 1 (i.e. ONLINE VALIDATION will be launched 
in loop with some delay), even single ISQL
session will work no more than several minutes.

Test will finish with exception that is raised because mismatches will be found 
by "exch-chk.bat" and its script.
So, online validation AFFECTS in some mysterious way on data consistency.


PS. This strange affect can also be observed in the trace log.

For SUCCESSFUL execution of each DML iteration trace will have lines like these:

===
    0 records fetched
        228 ms, 1 write(s), 15429 fetch(es), 3797 mark(s)

    Table                             Natural     Index    Update    Insert    
Delete
    
*********************************************************************************
    RDB$DATABASE                          200
    RDB$INDICES                                       1
    QA                                              541                 599     
  541
    QB                                              599                 541     
  599
    GTT_JOB                               146        10         3        97     
   49
===
(i.e. QA_inserts = QB_deletes = 599; QA_deletes = QB_inserts = 541 - see above).


But when check for data consistency FAILS, one may found in the trace following:

===
    Table                             Natural     Index    Update    Insert    
Delete   Backout     Purge   Expunge
    
***************************************************************************************************************
    RDB$DATABASE                          200
    RDB$INDICES                                       1
    QA                                             3145                         
  186                             1
    QB                                              964
    GTT_JOB                               317        13         9        91     
   44
===

or:
===
    Table                             Natural     Index    Update    Insert    
Delete   Backout     Purge   Expunge
    
***************************************************************************************************************
    RDB$DATABASE                          200
    RDB$INDICES                                       1
    QA                                              541                 599     
  457                           101
    QB                                              599                 316     
  599
    GTT_JOB                               146        10         3        97     
   49
===

or:
===
    Table                             Natural     Index    Update    Insert    
Delete
    
*********************************************************************************
    RDB$DATABASE                          200
    RDB$INDICES                                       3
    QA                                              534                 463     
  337
    QB                                              463                  74     
  463
    GTT_JOB                               146         9         3        97     
   49
===

(in all these samples QA_inserts <> QB_deletes  or QA_deletes <> QB_inserts).

PPS.
Done at WI-V3.0.0.32084, but I've seen this effect yet in july/august-2015 
during launch online-validation against OLTP-EMUL test was in work.



-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to