[jira] Updated: (DERBY-4239) corruption on z/OS with storerecovery oc_rec? tests. ERROR XSLA7: Cannot redo operation null in the log.

Kathey Marsden (JIRA) Thu, 21 May 2009 08:58:10 -0700

     [ 
https://issues.apache.org/jira/browse/DERBY-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Kathey Marsden updated DERBY-4239:
----------------------------------

    Derby Categories: [Data corruption, Regression Test Failure]

I have not been able to reproduce with 100 iterations with -Xint, so at first 
glance it would appear to be a JIT issue.  It does reproduce with 
-Xjit:optLevel=noOpt,count=0 which removes most of  JIT optimizations.   I was 
incorrect that it does not produce with 10.3.  I was able to pop the issue with 
10.3.3.1 - (765035). I had done the original run with a slightly earlier sane 
build.  I am not sure yet whether it only reproduces with insane builds.

Typically with JIT problems you can generate a log of all the compiled methods 
and their optimization level and 1) Feed that back into the next run to get a 
consistent reproduction. and 2) Do a binary search with iterative runs with 
half the log file to narrow down the failing method.
Unfortunately, neither of these methods work in this case, suggesting some 
timing or order of compilation issue.

I would like some tips on how to identify  issue earlier, preferably as the  
bad log record as it was written to disk.   Is there any way to do this?



> corruption on z/OS with storerecovery oc_rec? tests.  ERROR XSLA7: Cannot 
> redo operation null in the log.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4239
>                 URL: https://issues.apache.org/jira/browse/DERBY-4239
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.5.1.1
>         Environment: z/OS z10 processor. 
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build pmz3160sr4-20090219_01(SR4))
> IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31 
> jvmmz3160-20090215_29883 (JIT enabled, AOT enabled)
> J9VM - 20090215_029883_bHdSMr
> JIT  - r9_20090213_2028
> GC   - 20090213_AA)
> JCL  - 20090218_01
> also 
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build 
> pmz3160sr2ifix-20081021_01(SR2+IZ32776+IZ33456))
> IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31 
> jvmmz3160ifx-20081010_24288 (JIT enabled, AOT enabled)
> J9VM - 20081009_024288_bHdSMr
> JIT  - r9_20080721_1330ifx2
> GC   - 20080724_AA)
> JCL  - 20080808_02
>            Reporter: Kathey Marsden
>            Priority: Critical
>         Attachments: derby.log, reproDerby4239.zip, wombat_with_keeplog.zip
>
>
> I saw corruption on z/OS with the storerecovery tests and 10.5.1.1.  The 
> failure comes in oc_rec3 trying to connect to the database, but the actual 
> problem seems to have occurred with the prior test oc_rec2.  The problem is 
> somewhat intermittent, happening approximately 1/4 times.  I extracted the 
> case from the harness and will attach the reproduction and run the script 
> repro.ksh.  The script will loop up to 50 times until it gets the failure 
> which looks like.
> ERROR XSLA7: Cannot redo operation null in the log.
>       at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>       at org.apache.derby.impl.store.raw.log.FileLogger.redo(Unknown Source)
>       at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source)
>       at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
>       at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>       at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>       at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>       at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>       at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown 
> Source)
>       at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>       at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>       at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
>       at 
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown 
> Source)
>       at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
>       at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
>       at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
>       at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown 
> Source)
>       at 
> org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
>       at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown
>  Source)
>       at 
> org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
>  Source)
>       at 
> org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
>  Source)
>       at 
> org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
>       at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
>       at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
>       at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
>       at java.sql.DriverManager.getConnection(DriverManager.java:311)
>       at java.sql.DriverManager.getConnection(DriverManager.java:268)
>       at CheckTables.main(CheckTables.java:8)
> Caused by: ERROR XSDBB: Unknown page format at page Page(16,Container(0, 
> 1073)), page dump follows: Hex dump:
> 00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> 00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
> <snip lots of 000's>
> I ran it with 10.3 and it completed all 50 iterations, so whether JVM or 
> Derby issue it seems new since 10.3. (I haven't tried with 10.4).  Oddly I 
> have run tests many times before on this machine using in the 10.5.1.1 
> release and the same jvm and have never seen this failure, so am looking into 
> whether maybe something changed on the machine or environment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-4239) corruption on z/OS with storerecovery oc_rec? tests. ERROR XSLA7: Cannot redo operation null in the log.

Reply via email to