[ 
https://issues.apache.org/jira/browse/DERBY-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dag H. Wanvik updated DERBY-5325:
---------------------------------

    Attachment: derby-5325a.stat
                derby-5325a.diff

Uploading a patch for this issue, derby-5325a.

With NIO, writeRAFHeader has two methods leading to interruptible IO:
 - getEmbryonicPage
 - writeHeader
 
Currently, getEmbryonicPage may throw InterruptDetectedException and hence, so 
may writeRAFHeader.

writeHeader may throw ClosedByInterruptException, AsynchronousCloseException 
and ClosedChannelException because writeHeader does not use 
RAFContainer4#writePage, but rather uses RAFContainer4#writeAtOffset, which 
does not currently attempt to recover after interrupt.

So currently, clients of writeRAFHeader need to be prepared for all of 
InterruptDetectedException, ClosedByInterruptException, 
AsynchronousCloseException and ClosedChannelException.

writeRAFHeader is used in three locations:

 - RAFContainer#clean
 - RAFContainer#run(CREATE_CONTAINER_ACTION)
 - RAFContainer#run(STUBBIFY_ACTION)

RAFContainer#clean is prepared for InterruptDetectedException only. The issue 
shows that ClosedChannelException may also occur, and it is not prepared for 
that (this bug).

RAFContainer#run(CREATE_CONTAINER_ACTION) is prepared for 
ClosedByInterruptException and AsynchronousCloseException. Since IO during 
container creation is single-threaded, this is sufficient: it should never need 
to handle ClosedChannelException/InterruptDetectedException, both of which 
signal that another thread saw interrupt on the container channel.

RAFContainer#run(STUBBIFY_ACTION) is part of the removeContainer operation 
which should happen after the container is closed, so it should be 
single-threaded on the container as well(?). It should handle 
ClosedByInterruptException and AsynchronousCloseException and do retry, but 
doesn't, currently.

If we let writeAtOffset clean up just like writePage, 
RAFContainer4#writeAtOffset (i.e.also writeHeader) would only only throw 
InterruptDetectedException, i.e. another thread saw interrupt, so retry. This 
would simplify logic in RAFContainer: we could remove the retry logic from 
RAFContainer#run(CREATE_CONTAINER_ACTION). This could also cover retry logic 
for RAFContainer#run(STUBBIFY_ACTION) wrt its use of writeRAFHeader.

Next, RAFContainer#clean is already handling InterruptDetectedException and 
would with this change no longer see ClosedByInterruptException, 
AsynchronousCloseException or ClosedChannelException. This should solve 
DERBY-5325.

I did not add a new test for this issue yet since I don't know how to force 
this scenario. We have only seen it once, I believe. I'll be running 
InterruptResilienceTest continuously with this patch along with the patch for 
DERBY-5312 on several platforms to gain more confidence.


> Checkpoint fails with ClosedChannelException in InterruptResilienceTest
> -----------------------------------------------------------------------
>
>                 Key: DERBY-5325
>                 URL: https://issues.apache.org/jira/browse/DERBY-5325
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.9.0.0
>         Environment: Solaris 10 5/08 s10x_u5wos_10 X86
> Java(TM) SE Runtime Environment (build 1.7.0-b147)
> Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17 mixed mode)
>            Reporter: Knut Anders Hatlen
>            Assignee: Dag H. Wanvik
>         Attachments: derby-5325a.diff, derby-5325a.stat, derby.log, 
> error-stacktrace.out
>
>
> Seen here: 
> http://dbtg.foundry.sun.com/derby/test/Daily/jvm1.7/testing/testlog/sol/1144688-suitesAll_diff.txt
> There was 1 error:
> 1) 
> testRAFWriteInterrupted(org.apache.derbyTesting.functionTests.tests.store.InterruptResilienceTest)java.sql.SQLException:
>  The exception 'java.sql.SQLException: Log Record has been sent to the 
> stream, but it cannot be applied to the store (Object null).  This may cause 
> recovery problems also.' was thrown while evaluating an expression.
>       at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory40.getSQLException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.Util.newEmbedSQLException(Unknown Source)
>       at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown 
> Source)
>       at 
> org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.executeStatement(Unknown 
> Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
>       at org.apache.derby.impl.jdbc.EmbedStatement.executeUpdate(Unknown 
> Source)
>       at 
> org.apache.derbyTesting.functionTests.tests.store.InterruptResilienceTest.testRAFWriteInterrupted(InterruptResilienceTest.java:217)
> (...)
> Caused by: java.nio.channels.ClosedChannelException
>       at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:94)
>       at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:691)
>       at org.apache.derby.impl.store.raw.data.RAFContainer4.writeFull(Unknown 
> Source)
>       at 
> org.apache.derby.impl.store.raw.data.RAFContainer4.writeAtOffset(Unknown 
> Source)
>       at 
> org.apache.derby.impl.store.raw.data.FileContainer.writeHeader(Unknown Source)
>       at 
> org.apache.derby.impl.store.raw.data.RAFContainer.writeRAFHeader(Unknown 
> Source)
>       at org.apache.derby.impl.store.raw.data.RAFContainer.clean(Unknown 
> Source)
>       at 
> org.apache.derby.impl.services.cache.ConcurrentCache.cleanAndUnkeepEntry(Unknown
>  Source)
>       at 
> org.apache.derby.impl.services.cache.ConcurrentCache.cleanCache(Unknown 
> Source)
>       at 
> org.apache.derby.impl.services.cache.ConcurrentCache.cleanAll(Unknown Source)
>       at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.checkpoint(Unknown 
> Source)
>       at 
> org.apache.derby.impl.store.raw.log.LogToFile.checkpointWithTran(Unknown 
> Source)
>       at org.apache.derby.impl.store.raw.log.LogToFile.checkpoint(Unknown 
> Source)
>       at org.apache.derby.impl.store.raw.RawStore.checkpoint(Unknown Source)
>       at org.apache.derby.impl.store.raw.log.LogToFile.performWork(Unknown 
> Source)
>       at 
> org.apache.derby.impl.services.daemon.BasicDaemon.serviceClient(Unknown 
> Source)
>       at org.apache.derby.impl.services.daemon.BasicDaemon.work(Unknown 
> Source)
>       at org.apache.derby.impl.services.daemon.BasicDaemon.run(Unknown Source)
>       at java.lang.Thread.run(Thread.java:722)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to