[ 
https://issues.apache.org/jira/browse/DERBY-5638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228473#comment-13228473
 ] 

Mamta A. Satoor commented on DERBY-5638:
----------------------------------------

I debugged more to see why we get 5 lock timeouts with embedded driver and then 
3 locks with netwrok server before running into intermittent failure. My 
initial thought was that we will get 5 lock time outs both with embedded and 
network server before we give up trying to drop the tables.

Here is the general flow chart from dropping tables point of view when the 
suite is using CleanDatabaseTestSetup. 

At the end of the last test fixture in the suite, 
CleanDatabaseTestSetup.tearDown gets called which does following(focusing on 
drop table related code, ignoring the code related to dropping aliases, views 
etc)
CleanDatabaseTestSetup.tearDown->
CleanDatabaseTestSetup.cleanDatabase->
CleanDatabaseTestSetup.removeObjects->
        JDBC.dropSchema->
        ....
        // Tables
        rs = dmd.getTables((String) null, schema, (String) null,
                GET_TABLES_TABLE);
                
        dropUsingDMD(s, rs, schema, "TABLE_NAME", "TABLE");
                ....
        // At this point there may be tables left due to
                // foreign key constraints leading to a dependency loop.
        // Drop any constraints that remain and then drop the tables.
        // If there are no tables then this should be a quick no-op.
        .....
        // Tables (again)
                rs = dmd.getTables((String) null, schema, (String) null,
                        GET_TABLES_TABLE);        
        dropUsingDMD(s, rs, schema, "TABLE_NAME", "TABLE");

The code in dropUsingDMD is as follows
First try to drop the objects in a batch rather than individually.If there is 
any BatchUpdateException while dropping the objects as a batch, drop the 
remaining objects inidividually in a for loop. If the for loop gets
an exception but some of the objects before the exception got dropped 
successfully, then go back into the for loop and try to drop remaining objects 
again. This continues until either
a)All objects are dropped and we can return
b)no objects can be dropped because of exception

After this CleanDatabaseTestSetup.tearDown is finished(with or without 
success), the control goes to the CleanDatabaseTestSetup.setUp for the next 
suite. Among other things, CleanDatabaseTestSetup.setUp does following 2 main 
steps 
1)call CleanDatabaseTestSetup.cleanDatabase (drop any objects in the db)
2)call decorateSQL(create new objects in the db)

Now, let's go through this code in our intermittent failure case. The last test 
from the embedded suite LobLimitsTest finishes with no errors and we go to 
CleanDatabaseTestSetup.tearDown(). This eventually leads to JDBC.dropUsingDMD 
as shown above. 

In JDBC.dropUsingDMD(), we first try to drop all the tables as a batch rather 
than individual drops. Here, we get the first lock timeout. Since we failed 
here, next we go in a loop to drop the tables individually. Here we manage to 
drop "APP"."CLOBTBL2" but get lock time out(2nd one) for trying to drop 
"APP"."CLOBTBL". But since the loop was able to drop at least one table, we go 
back into the loop to try to drop remaining tables. But that results into 3rd 
lock timeout for "APP"."CLOBTBL".  After these tries, the control goes back to 
JDBC.dropSchema where we try to drop the tables yet again after dropping any 
constraints which may be preventing the table drops(in our case, I don't 
believe we have any constraints in the db). This 2nd attempt at dropping the 
table takes us back to JDBC.dropUsingDMD.Again we try to drop all the tables as 
batch but get the 4th lock timeout for "APP"."CLOBTBL". Next, we try to drop 
the tables inidividually in a loop, but the drop of "APP"."CLOBTBL" fails with 
5th lock timeout. And since we were not able to drop any table in the loop, we 
exit out of the loop and eventually exit out of CleanDatabaseTestSetup.tearDown 
without reporting all this as a failure even though we were not able to clean 
the database of it's objects.

Next, before the beginning of the next suite(which in our case happens to be 
network server suite), we go through the CleanDatabaseTestSetup.setUp code. As 
explained earlier, one of the things that CleanDatabaseTestSetup.setUp does is 
to call CleanDatabaseTestSetup.cleanDatabase. Which again goes through the drop 
table logic as explained above. It will attempt to drop the existing tables 
in batch but that runs into 1st lock timeout for "APP"."CLOBTBL". Next, it 
tries to drop the table individually, and it runs into 2nd lock timeout. Then 
the control goes back to CleanDatabaseTestSetup.removeObjects which calls drop 
table again after it drops the constraints. This time around, the batch 
exception runs into 3rd lock timeout exception. With my understanding of
the code, we should have run into 4th lock timeout trying to drop the table 
individually but for some reason, I don't see that 4th lock time out with DRDA 
driver in derby.log. In any case, we once again exit 
CleanDatabaseTestSetup.cleanDatabase without successfully getting rid of all 
the tables and next step in CleanDatabaseTestSetup.setUp is the call to 
decorateSQL().decorateSQL fails when it tries to create tables but they already 
exist in the database becaue CleanDatabaseTestSetup.cleanDatabase was not able 
to get rid of them and that is why we get the exception Table/View 'BLOBTBL' 
already exists in Schema 'APP'.

I am not sure why this locking issue is only intermittent but atleast we 
understand what is happening during the suite runs.

Not sure if it is correct for CleanDatabaseTestSetup.cleanDatabase to run into 
exceptions while doing the cleanup but ignore those exceptions. I ran into a 
very old jira DERBY-2220(Uncommitted transactions executed throught XAResource 
will held locks after the application terminates (or crashes during the 
transaction).) where there was a brief conversation about how may be 
CleanDatabaseTestSetup.cleanDatabase should drop the database if it is not able 
to drop the objects inside the database. Here is part of Dan Debrunner's 
comment on that issue
******************
I also think that you've found an issue in the clean database decorator and as 
such it would be good to fix it centrally there, rather than in a single test. 
for example, if the clean database decorator failed to cleanup the database, 
it could report the failure and then blow away the database.
******************

We can do atleast one of the two things for this particular jira at this point 
(or may be even both)
1)Have CleanDatabaseTestSetup.cleanDatabase report that it is having trouble 
deleting the objects and hence it will drop the database and create the objects 
mentioned in decorateSQL. This will get rid of intermittent failures in large 
data suite but it will not explain why do we intermittently run into lock 
timeouts
2)Debug why we are running into intermittently lock timeouts with large data 
suite. I am running the tests with derby.locks.deadlockTrace=true to see if it 
will give us more information about the locks.

Please let me know if there are any other options we could be pursuing.
                
> intermittent test failure in test_05_ClobNegative when running full 
> largedata._Suite; LobLimitsTestjava.sql.SQLException: Table/View 'BLOBTBL' 
> already exists in Schema 'APP'.
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-5638
>                 URL: https://issues.apache.org/jira/browse/DERBY-5638
>             Project: Derby
>          Issue Type: Bug
>          Components: Test
>    Affects Versions: 10.9.0.0
>         Environment: ibm 1.6 sr9 fp1, Seen on Windows XP/VMWare, and Linux 
> (CentOS)/VMWare
>            Reporter: Myrna van Lunteren
>            Assignee: Mamta A. Satoor
>         Attachments: CompleteRunallWithPrintlnTrace.out, 
> DERBY5638_patch1_diff.txt, DERBY5638_patch2_diff.txt, derbyFailed.log, 
> derbyPass.log, derbyWithRollbackInTest_05.log, 
> derbyallFailWithPrintlnTrace.log, runallForSuccessfulLargeDataRun.out, 
> runallWithPrintlnTrace.out
>
>
> I've seen the following failure when running the largedata suite:
> (emb)largedata.Derby5624Test.testDERBY_5624 used 518403 ms .
> (emb)largedata.LobLimitsTest.test_01_Blob used 2422 ms .
> (emb)largedata.LobLimitsTest.test_02_BlobNegative used 31 ms .
> (emb)largedata.LobLimitsTest.test_03_Clob1 used 2375 ms .
> (emb)largedata.LobLimitsTest.test_04_Clob2 used 3234 ms .
> (emb)largedata.LobLimitsTest.test_05_ClobNegative used 516 ms .
> (net)largedata.LobLimitsTest.test_01_Blob used 5360 ms .
> (net)largedata.LobLimitsTest.test_02_BlobNegative used 32 ms .
> (net)largedata.LobLimitsTest.test_03_Clob1 used 2078 ms .
> (net)largedata.LobLimitsTest.test_04_Clob2 used 2390 ms .
> (net)largedata.LobLimitsTest.test_05_ClobNegative used 938 ms .
> (emb)largedata.LobLimitsTest.test_01_Blob used 9188238 ms .
> (emb)largedata.LobLimitsTest.test_02_BlobNegative used 109 ms .
> (emb)largedata.LobLimitsTest.test_03_Clob1 used 8116714 ms .
> (emb)largedata.LobLimitsTest.test_04_Clob2 used 2164002 ms .
> (emb)largedata.LobLimitsTest.test_05_ClobNegative used 685745 ms E
> Time: 22,320.138
> There was 1 error:
> 1) LobLimitsTestjava.sql.SQLException: Table/View 'BLOBTBL' already exists in 
> Schema 'APP'.
>       at 
> org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>       at org.apache.derby.client.am.SqlException.getSQLException(Unknown 
> Source)
>       at org.apache.derby.client.am.Statement.execute(Unknown Source)
>       at 
> org.apache.derbyTesting.functionTests.tests.largedata.LobLimitsTest.setupTables(LobLimitsTest.java:107)
>       at 
> org.apache.derbyTesting.functionTests.tests.largedata.LobLimitsTest$1.decorateSQL(LobLimitsTest.java:141)
>       at 
> org.apache.derbyTesting.junit.CleanDatabaseTestSetup.setUp(CleanDatabaseTestSetup.java:112)
>       at junit.extensions.TestSetup$1.protect(TestSetup.java:20)
>       at junit.extensions.TestSetup.run(TestSetup.java:25)
>       at 
> org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
>       at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
>       at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
>       at junit.extensions.TestSetup.run(TestSetup.java:25)
>       at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
>       at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
>       at junit.extensions.TestSetup.run(TestSetup.java:25)
>       at 
> org.apache.derbyTesting.junit.BaseTestSetup.run(BaseTestSetup.java:57)
>       at junit.extensions.TestDecorator.basicRun(TestDecorator.java:24)
>       at junit.extensions.TestSetup$1.protect(TestSetup.java:21)
>       at junit.extensions.TestSetup.run(TestSetup.java:25)
> Caused by: org.apache.derby.client.am.SqlException: Table/View 'BLOBTBL' 
> already exists in Schema 'APP'.
>       at org.apache.derby.client.am.Statement.completeSqlca(Unknown Source)
>       at 
> org.apache.derby.client.am.Statement.completeExecuteImmediate(Unknown Source)
>       at 
> org.apache.derby.client.net.NetStatementReply.parseEXCSQLIMMreply(Unknown 
> Source)
>       at 
> org.apache.derby.client.net.NetStatementReply.readExecuteImmediate(Unknown 
> Source)
>       at 
> org.apache.derby.client.net.StatementReply.readExecuteImmediate(Unknown 
> Source)
>       at 
> org.apache.derby.client.net.NetStatement.readExecuteImmediate_(Unknown Source)
>       at org.apache.derby.client.am.Statement.readExecuteImmediate(Unknown 
> Source)
>       at org.apache.derby.client.am.Statement.flowExecute(Unknown Source)
>       at org.apache.derby.client.am.Statement.executeX(Unknown Source)
>       ... 26 more
> Unfortunately, when this happens, there seems to be no 'fail' directory 
> created. The derby.log in the system directory looks very innocent (just some 
> start up and shutting down of the database), and the serverConsoleOutput.log 
> only has the typical 'failed to find db 'wombat' messages'.
> Note, when this happens, the suite exits, so that instead of the expected 20 
> (or 21 on windows, see DERBY-5624 for reason for skipping on Linux default 
> installs with 1024 max open files) we only get 15 (or 16) tests run - if the 
> test doesn't fail it goes on to run the last 5 fixtures again for network 
> server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to