[
https://issues.apache.org/jira/browse/DERBY-241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kathey Marsden resolved DERBY-241.
----------------------------------
Resolution: Cannot Reproduce
There is no indication that this has happened since 2005, so resolving Cannot
Reproduce.
> Encrypted run of stress.multi test failed once with a boot error with ibm142
> ----------------------------------------------------------------------------
>
> Key: DERBY-241
> URL: https://issues.apache.org/jira/browse/DERBY-241
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.1.1.0
> Environment: ibm142, machine is a dell, 1cpu, 256MB RAM, ~497Mhz, has
> an IDE disk and has write cache enabled.
> Reporter: Sunitha Kambhampati
> Attachments: encryption_multi.zip, od_c_c180.txt, pageDataHexDump.txt
>
>
> The stress.multi test failed for encryption run with ibm142 on the following
> kind of machine once when running derbyall suite but have not been able to
> reproduce it since then.
> The machine on which it failed is a - dell, 1cpu, 256MB RAM, ~497Mhz, has an
> IDE disk and has write cache enabled. As far as I can tell, the machine was
> up and running ok when the tests were running.
> Looking at the test directory for the stress.multi test, the derby.log seems
> to have a lot of interrupts and looking at the errors shows the following
> boot error.
> Booting Derby version The Apache Software Foundation - Apache Derby -
> 10.1.0.0 alpha - (31132): instance c013800d-0103-64b3-44ec-ffffa1f4cf33
> on database directory
> E:\classtest\JarResults.2005-04-20\ibm142_derbyall\derbyall\encryptionAll\encryption\multi\stress\mydb
>
> ERROR XSLA7: Cannot redo operation Page Operation: Page(5,Container(0, 384))
> pageVersion 3 : Insert : Slot=2 recordId=8 in the log.
> Here are some of my notes in trying to debug this:
> 0) Copied the problematic database to a safe location and used sane jars for
> debugging.
> 1) Tried to boot the database using ij , and with the following debug
> property set - derby.debug.true=DumpLogOnly, this dumped all the log records
> into derby.log. Then searching for log records for the container(0,384) -
> found only 3 log records pertaining to it.
> there is one for create container and 2 records for insert.
> Space Operation for create container ( 0,384)
> Page operation for (Page 5, Container(0,384)), version 3 ,
> involving an insert at slot 2, record 8.
> Page operation for version 4, involving insert at slot 3,
> record 9.
> => There were no initPage operation for this page or any records pertaining
> to pageversion 1,2. This means that log records were missing, but the only
> case this would be ok is if it was a system catalog table. Since in case of
> create database, we flush the data pages to disk itself, so no logs in this
> case is OK.
> 2)Next step - tried to verify if it was a system catalog table.
> Looking in the org.apache.derby.impl.store.access.RAMAccessManager,
> getNextConglomId(), the container key - 384 maps to 18th id.
> One way I verified it was I created another empty database and saw if this
> table existed c180.dat and it did.. which is right that is a system catalog
> table.
> 3) To find the actual cause of the redo exception, I put in printstack traces
> in the code, and putting in the debugger - the error printed was
> ERROR XSDB1: Unknown page format at page Page(5,Container(0, 384))
> It seemed like the page format was messed up. I put printlns to get hte page
> format id ( in CachedPage, setIdentity) and tried to dump the contents of
> the page.
> The checksum validation actually would have happened if all was ok with the
> format id but since here the format id was messed up, this error is thrown
> instead of a checksum error.
> 4) There is a od facility in MKS that dumps the contents in hex and character
> format. This table mapped to the 18th id, and that is the c180.dat in seg0
> directory. Doing a dump od -c c180.dat shows stuff like this :
> S Y S C S _ B A C
> 0000034040 K U P _ D A T A B A S E _ A N D
> 0000034060 _ E N A B L E _ L O G _ A R C H
> 0000034100 I V E _ M O D E \
> These seem to be system catalog procedure names, and it seems weird that it
> would not be encrypted.
> Need to verify if system catalogs are encrypted, if so then this probably is
> a interrupt problem with encryption.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.