[ http://issues.apache.org/jira/browse/DERBY-662?page=all ] Mike Matrigali reopened DERBY-662: ----------------------------------
merging fix into 10.0 > during crash recovery of a drop table, on case insensitive files systems > derby may delete wrong file > ---------------------------------------------------------------------------------------------------- > > Key: DERBY-662 > URL: http://issues.apache.org/jira/browse/DERBY-662 > Project: Derby > Type: Bug > Components: Store > Versions: 10.1.1.1 > Environment: jvm/os/filesystem where file names are case insensitive such > that delete of C2080.dat will remove c2080.dat if it exists. > Reporter: Mike Matrigali > Assignee: Mike Matrigali > Priority: Blocker > Fix For: 10.1.2.1, 10.2.0.0 > Attachments: dropcrash.diff > > Sometimes during redo the system will incorrectly remove the file associated > with a table. The bug requires the following conditions to reproduce: > 1) The OS/filesystem must be case insensitive such that a request to delete > a file named C2080.dat would also remove c2080.dat. This is true in > windows default file systems, not true in unix/linux filesystems that > I am aware of. > 2) The system must be shutdown not in a clean manner, such that a subsequent > access of the database causes a REDO recovery action of a drop table > statement. This means that a drop table statement must have happened > since the last checkpoint in the log file. Examples of things that cause > checkpoints are: > o clean shutdown from ij using the "exit" command > o clean shutdown of database using the "shutdown=true" url > o calling the checkpoint system procedure > o generating enough log activity to cause a regularly scheduled checkpoint. > 3) If the conglomerate number of the above described drop table is TABLE_1, > then for a problem to occur there must also exist in the database a table > such that it's HEX(TABLE_2) = TABLE_1 > 4) Either TABLE_2 must not be accessed during REDO prior to the REDO operation > of the drop of TABLE_1 or there must be enough other table references > during > the REDO phase to push the caching of of the open of TABLE_2 out of cache. > If all of the above conditions are met then during REDO the system will > incorrectly delete TABLE_2 while trying to redo the drop of TABLE_1. > <p> > I will be adding the following test to reproduce the problem: > 1) create 500 tables, need enough tables to insure that conglomerate number > 2080 (c820.dat) and 8320 (c2080.dat) exist. > 2) checkpoint the database so that create does not happen during REDO > 3) drop table with conglomerate number 2080, mapping to c820.dat. It looks > it up in the catalog in case conglomerate number assignment changes for > some reason. > 4) exit the database without a clean shudown, this is the default for test > suites which run multiple tests in a single db - no clean shutdown is done. > Since we only do a single drop since the last checkpoint, test will cause > the drop during the subsequent REDO. > 5) run next test program dropcrash2, which will cause redo of the drop. At > this point the bug will cause file c2080.dat to be incorrectly deleted and > thus accesses to conglomerate 8320 will throw container does not exist > errors. > 6) check the consistency of the database which will find the container does > not exist error. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
