[
https://issues.apache.org/jira/browse/DERBY-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mike Matrigali updated DERBY-4239:
----------------------------------
The normal case where we ask for a checkpoint, which is triggered by default
when we think
we have logged approximately 10meg of log is a case where we don't want to
start a new one.
The only reason we are doing a checkpoint in this case is to minimize recovery
time if we happen to crash. If there is already a checkpoint in progress, then
that is good enough. There
is no correctness of needed a checkpoint to start NOW and wait for it to finish.
Checkpoints can really slow down the over all throughput of the system,
especially if user
has increased the cache size, so we don't want to do additional ones if they are
unnecessary.
I am not sure what backup needs.
In the case of the user callable routine we don't really say much about what it
does:
The SYSCS_UTIL.SYSCS_CHECKPOINT_DATABASE system procedure checkpoints the
database by flushing all cached data to disk. But I would lean toward changing
its behavior to
also do another checkpoint.
I am tempted to change the patch to eliminate the wait parameter, and instead
all code that
currently calls wait will always force a new checkpoint and wait for it if it
finds a checkpoint in
progress. If I do this change I will make sure the "normal" checkpoint does
not call this path. Any opinions? It would be nice if we could generate bug
scripts that show the
specific bugs that are fixed by adding the additional checkpoints, but this is
hard as is
evidenced we still don't have a perfect repro for the compress bug.
> corruption on z/OS with storerecovery oc_rec? tests. ERROR XSLA7: Cannot
> redo operation null in the log.
> ---------------------------------------------------------------------------------------------------------
>
> Key: DERBY-4239
> URL: https://issues.apache.org/jira/browse/DERBY-4239
> Project: Derby
> Issue Type: Bug
> Components: Store
> Affects Versions: 10.1.3.3, 10.2.2.1, 10.3.2.1, 10.4.2.0, 10.5.1.1,
> 10.6.0.0
> Environment: z/OS z10 processor.
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build pmz3160sr4-20090219_01(SR4))
> IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31
> jvmmz3160-20090215_29883 (JIT enabled, AOT enabled)
> J9VM - 20090215_029883_bHdSMr
> JIT - r9_20090213_2028
> GC - 20090213_AA)
> JCL - 20090218_01
> also
> java version "1.6.0"
> Java(TM) SE Runtime Environment (build
> pmz3160sr2ifix-20081021_01(SR2+IZ32776+IZ33456))
> IBM J9 VM (build 2.4, J2RE 1.6.0 IBM J9 2.4 z/OS s390-31
> jvmmz3160ifx-20081010_24288 (JIT enabled, AOT enabled)
> J9VM - 20081009_024288_bHdSMr
> JIT - r9_20080721_1330ifx2
> GC - 20080724_AA)
> JCL - 20080808_02
> Reporter: Kathey Marsden
> Assignee: Mike Matrigali
> Priority: Critical
> Attachments: badlogsizes.txt, derby-4239_1.diff, derby.log,
> derby.log, derby_dumponly.zip, goodlogsizes.txt, identifyBadContainer.ksh,
> reproBackgroundCheckpoint.zip, reproDerby4239.zip,
> wombat_keeplog_notcorrupt.zip, wombat_with_keeplog.zip
>
>
> I saw corruption on z/OS with the storerecovery tests and 10.5.1.1. The
> failure comes in oc_rec3 trying to connect to the database, but the actual
> problem seems to have occurred with the prior test oc_rec2. The problem is
> somewhat intermittent, happening approximately 1/4 times. I extracted the
> case from the harness and will attach the reproduction and run the script
> repro.ksh. The script will loop up to 50 times until it gets the failure
> which looks like.
> ERROR XSLA7: Cannot redo operation null in the log.
> at org.apache.derby.iapi.error.StandardException.newException(Unknown
> Source)
> at org.apache.derby.impl.store.raw.log.FileLogger.redo(Unknown Source)
> at org.apache.derby.impl.store.raw.log.LogToFile.recover(Unknown Source)
> at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
> at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown
> Source)
> at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown
> Source)
> at
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
> at
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown
> Source)
> at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown
> Source)
> at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown
> Source)
> at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown
> Source)
> at
> org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
> at
> org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown
> Source)
> at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
> at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
> at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown
> Source)
> at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown
> Source)
> at
> org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
> at
> org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown
> Source)
> at
> org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown
> Source)
> at
> org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown
> Source)
> at
> org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown
> Source)
> at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
> at org.apache.derby.jdbc.Driver40.getNewEmbedConnection(Unknown Source)
> at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
> at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
> at java.sql.DriverManager.getConnection(DriverManager.java:311)
> at java.sql.DriverManager.getConnection(DriverManager.java:268)
> at CheckTables.main(CheckTables.java:8)
> Caused by: ERROR XSDBB: Unknown page format at page Page(16,Container(0,
> 1073)), page dump follows: Hex dump:
> 00000000: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> 00000010: 0000 0000 0000 0000 0000 0000 0000 0000 ................
> <snip lots of 000's>
> I ran it with 10.3 and it completed all 50 iterations, so whether JVM or
> Derby issue it seems new since 10.3. (I haven't tried with 10.4). Oddly I
> have run tests many times before on this machine using in the 10.5.1.1
> release and the same jvm and have never seen this failure, so am looking into
> whether maybe something changed on the machine or environment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.