Index corruption. Validation put "missing entries" message into firebird.log
----------------------------------------------------------------------------

                 Key: CORE-3515
                 URL: http://tracker.firebirdsql.org/browse/CORE-3515
             Project: Firebird Core
          Issue Type: Bug
          Components: Engine
    Affects Versions: 2.1.4, 2.5.0, 2.0.6, 1.5.6, 2.1.3, 2.1.2, 2.0.5, 2.1.1, 
2.0.4, 2.1.0
            Reporter: Vlad Khorsun



  Imagine a table T with two indices PK (unique, it is important) and IDX2 (in 
my case 
it is very bad index with 2 different values only but this is not so 
important). 
So, we have :

CREATE TABLE T (
  ID    INT NOT NULL,
  VAL   INT
);

ALTER TABLE T ADD CONSTRAINT PK PRIMARY KEY ON (ID);
CREATE INDEX IDX2 ON T (VAL);

INSERT INTO T VALUES (1, 0);
COMMIT;


    The sequence of actions is following:

 1. tx1: insert into T values (1, 0)
 2. tx1:   VIO_store 
             returns OK, new record have recno = 1
 3. tx1:   IDX_store
 4. tx1:       insert_key 
                 index == PK, key == 1, recno == 1
                 returns duplicate error
     i.e. we have unique violation in index PK
     note, there was no attempt to insert key into index IDX2

 5. tx1:   VIO_backout
 6. tx1:       delete_record ... OK

 7. tx2: insert into T values (2, 0)
 8. tx2:   VIO_store ... 
             returns OK, new record have recno = 1, yes same recno !!!
 9. tx2:   IDX_store
10. tx2:       insert_key 
                 index == PK, key == 2, recno == 1
                 returns OK
11. tx2:       insert_key ... returns OK
                 index == IDX2, key == 0, recno == 1
                 returns OK
12. tx2: commit 

13. tx1:       IDX_garbage_collect
14. tx1:           BTR_remove
                     index == PK, key == 1, recno == 1
15. tx1:           BTR_remove
                     index == IDX2, key == 0, recno == 1
                     here we removed not ours index entry !!!

    So, we have at disk (ignoring pre-existing record with recno = 0): 
after (4)  : recno = 1,  PK : entry {key = 1, rec = 1},  IDX2 : no entries
after (6)  : no records, PK : entry {1, 1},              IDX2 : no entries
after (12) : recno = 1,  PK : entries {1, 1} and {2, 1}, IDX2 : entry {0, 1}
after (14) : recno = 1,  PK : entry {2, 1},              IDX2 : entry {0, 1}
after (15) : recno = 1,  PK : entry {1, 1},              IDX2 : no entries

and finally we have missed entry at index IDX2 for record 1.


    The issue happens when all conditions are met at the same time :
a) first insert violates indexed constraint (unique or foreign key)
b) this indexed constraint is not a physically last index, so some indices have
   no entries for the failed record
c) VIO_backout deleted record and at the same time new record is inserted at 
   the same slot on data page and have assigned the same record number
d) at least one index from the second group in (b) have the same key value in 
   the new record as in failed record 
e) second insert completes before backout started to remove index entries of
   failed record

    I think the problem could happen no only with two inserts, but with two 
updates too (first update failed, backout started and second update came at
inappropriate moment). Also it seems possible to have "blob not found" errors
by the same reason.


    To fix the issue i offer to delay physical record removal (in case of 
backout) until 
all going index entries will be removed. This will prevent concurrent insert 
from 
creating new record with the same record number until backout maintains indices.

    To do it, i offer to split backout on the two phases. At the first phase 
don't remove 
record from disk but mark it with current transaction number and flag 
rpb_gc_active. 
Then cleanup indices and after remove backed out record version completely. 

  It seems safe as :
a) marking record with current transaction number prevents concurrent backouts
b) marking record with rpb_gc_active flag allow readers to skip this record
   version and read previous one (which will be primary record version after
   backout completes)
c) if our process will die during backout, next process will see our 
transaction 
   as dead and will start all over again

    The patch passed my tests and also runs in production over month by the 
Pavel 
Zotov who reported the bug and made great help to investigate it.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://tracker.firebirdsql.org/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

------------------------------------------------------------------------------
Simplify data backup and recovery for your virtual environment with vRanger.
Installation's a snap, and flexible recovery options mean your data is safe,
secure and there when you need it. Discover what all the cheering's about.
Get your free trial download today. 
http://p.sf.net/sfu/quest-dev2dev2 
Firebird-Devel mailing list, web interface at 
https://lists.sourceforge.net/lists/listinfo/firebird-devel

Reply via email to