[ 
https://issues.apache.org/jira/browse/DERBY-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kathey Marsden resolved DERBY-241.
----------------------------------

    Resolution: Cannot Reproduce

There is no indication that this has happened since 2005, so resolving Cannot 
Reproduce.

> Encrypted run of stress.multi test failed once with a boot error with ibm142
> ----------------------------------------------------------------------------
>
>                 Key: DERBY-241
>                 URL: https://issues.apache.org/jira/browse/DERBY-241
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.1.1.0
>         Environment: ibm142, machine is a dell, 1cpu, 256MB RAM, ~497Mhz, has 
> an IDE disk and has write cache enabled.
>            Reporter: Sunitha Kambhampati
>         Attachments: encryption_multi.zip, od_c_c180.txt, pageDataHexDump.txt
>
>
> The stress.multi test failed for encryption run with ibm142 on  the following 
> kind of machine once when running derbyall suite but have not been able to 
> reproduce it since then. 
> The machine on which it failed is a  - dell, 1cpu, 256MB RAM, ~497Mhz, has an 
> IDE disk and has write cache enabled. As far as I can tell, the machine was 
> up and running ok when the tests were running.
> Looking at the test directory for the stress.multi test, the derby.log seems 
> to have a lot of interrupts and looking at the errors shows the following 
> boot error. 
>  Booting Derby version The Apache Software Foundation - Apache Derby - 
> 10.1.0.0 alpha - (31132): instance c013800d-0103-64b3-44ec-ffffa1f4cf33
> on database directory 
> E:\classtest\JarResults.2005-04-20\ibm142_derbyall\derbyall\encryptionAll\encryption\multi\stress\mydb
>  
> ERROR XSLA7: Cannot redo operation Page Operation: Page(5,Container(0, 384)) 
> pageVersion 3 : Insert :  Slot=2 recordId=8 in the log.
> Here are some of my notes in trying to debug this: 
> 0) Copied the problematic database to a safe location and used sane jars for 
> debugging.
> 1) Tried to boot the database using ij , and with the following debug 
> property set  - derby.debug.true=DumpLogOnly, this dumped all the log records 
> into derby.log.  Then searching for log records for the container(0,384) - 
> found only 3 log records pertaining to it.  
> there is one for create container and 2 records for insert. 
> Space Operation for create container ( 0,384)
> Page operation for (Page 5, Container(0,384)), version 3 , 
> involving an insert at slot 2, record 8. 
> Page operation for  version 4, involving insert at slot 3, 
> record 9.  
> => There were no initPage operation for this page or any records pertaining 
> to pageversion 1,2.   This means that log records were missing, but the only 
> case this would be ok  is if it was a system catalog table.  Since in case of 
> create database, we flush the data pages to disk itself,  so no logs in this 
> case  is OK. 
> 2)Next step - tried to verify if it was a system catalog table. 
> Looking in the org.apache.derby.impl.store.access.RAMAccessManager, 
> getNextConglomId(),  the container key - 384 maps to 18th id.   
> One way I verified it was I created another empty database and saw if this 
> table existed c180.dat and it did.. which is right that is a system catalog 
> table. 
> 3) To find the actual cause of the redo exception, I put in printstack traces 
> in the code, and putting in the debugger  -  the error printed was
> ERROR XSDB1: Unknown page format at page Page(5,Container(0, 384))
> It seemed like the page format was messed up.  I put printlns to get hte page 
> format id ( in CachedPage, setIdentity)  and tried to dump the contents of 
> the page. 
> The checksum validation actually would have happened if all was ok with the 
> format id but since here the format id was messed up, this error is thrown 
> instead of a checksum error. 
> 4) There is a od facility in MKS that dumps the contents in hex and character 
> format.  This table mapped to the 18th id, and that is the c180.dat in seg0 
> directory.  Doing a dump  od -c c180.dat shows stuff like this : 
>  S   Y   S   C   S   _   B   A   C
> 0000034040     K   U   P   _   D   A   T   A   B   A   S   E   _   A   N   D
> 0000034060     _   E   N   A   B   L   E   _   L   O   G   _   A   R   C   H
> 0000034100     I   V   E   _   M   O   D   E  \
> These seem to be system catalog procedure names, and it seems weird that it 
> would not be encrypted.     
> Need to verify if system catalogs are encrypted, if so then this probably is 
> a interrupt problem with encryption. 
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to