[ 
https://issues.apache.org/jira/browse/DERBY-4741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923238#action_12923238
 ] 

Dag H. Wanvik edited comment on DERBY-4741 at 10/20/10 7:50 PM:
----------------------------------------------------------------

Note to self: I have found a problem with the interrupt recovery of
RAFContainer4. Its call to openContainer needs to be protected by the
monitor on FileContainer#allocCache, because opening a container
evetually leads to a call on AllocationCache#reset. AllocCache javadoc
states that the the callers need to synchronize themselves since it is
itself not MT safe. 

Threads inside RAFContainer4#{readPage, writePage} do not necessarily
own this monitor when recovery is attempted.

I did once see a race condition due to this.  In the race condition, a
thread was trying to write to a new page and got a array out of bounds
exception inside AllocationCache.validate (numExtents was suddenly back to 0)
because another thread was doing interrupt recovery by calling
RAFContainer4#recoverContainerAfterInterrupt ->
openContainer -> ... -> AllocationCache#reset (unprotected).

[edit add]:
Simply enveloping recoverContainerAfterInterrupt's call to openContainer in 
synchronized(allocCache) won't work: can lead to deadlock.

      was (Author: dagw):
    Note to self: I have found a problem with the interrupt recovery of
RAFContainer4. Its call to openContainer needs to be protected by the
monitor on FileContainer#allocCache, because opening a container
evetually leads to a call on AllocationCache#reset. AllocCache javadoc
states that the the callers need to synchronize themselves since it is
itself not MT safe.

Threads inside RAFContainer4#{readPage, writePage} do not necessarily
own this monitor when recovery is attempted.

I did once see a race condition due to this.  In the race condition, a
thread was trying to write to a new page and got a array out of bounds
exception inside AllocationCache.validate (numExtents was suddenly back to 0)
because another thread was doing interrupt recovery by calling
RAFContainer4#recoverContainerAfterInterrupt ->
openContainer -> ... -> AllocationCache#reset (unprotected).

  
> Make Derby work reliably in the presence of thread interrupts
> -------------------------------------------------------------
>
>                 Key: DERBY-4741
>                 URL: https://issues.apache.org/jira/browse/DERBY-4741
>             Project: Derby
>          Issue Type: Bug
>          Components: Store
>    Affects Versions: 10.2.1.6, 10.2.2.0, 10.3.1.4, 10.3.2.1, 10.3.3.0, 
> 10.4.1.3, 10.4.2.0, 10.5.1.1, 10.5.2.0, 10.5.3.0, 10.6.1.0
>            Reporter: Dag H. Wanvik
>            Assignee: Dag H. Wanvik
>         Attachments: derby-4741-all+lenient+resurrect.diff, 
> derby-4741-all+lenient+resurrect.stat, 
> derby-4741-nio-container+log+waits+locks+throws.diff, 
> derby-4741-nio-container+log+waits+locks+throws.stat, 
> derby-4741-nio-container+log+waits+locks-2.diff, 
> derby-4741-nio-container+log+waits+locks-2.stat, 
> derby-4741-nio-container+log+waits+locks.diff, 
> derby-4741-nio-container+log+waits+locks.stat, 
> derby-4741-nio-container+log+waits.diff, 
> derby-4741-nio-container+log+waits.stat, derby-4741-nio-container+log.diff, 
> derby-4741-nio-container+log.stat, derby-4741-nio-container-2.diff, 
> derby-4741-nio-container-2.log, derby-4741-nio-container-2.stat, 
> derby-4741-nio-container-2b.diff, derby-4741-nio-container-2b.stat, 
> derby.log, derby.log, MicroAPITest.java, xsbt0.log.gz
>
>
> When not executing on a small device VM, Derby has been using the Java NIO 
> classes java.nio.clannel.* for file io.
> If thread is interrupted while executing blocking IO operations in NIO, the 
> ClosedByInterruptException will get thrown. Unfortunately, Derby isn't 
> current architected to retry and complete such operations (before passing on 
> the interrupt), so the Derby database can be left in an inconsistent state 
> and we therefore have to return a database level error. This means the 
> applications can no longer access the database without a shutdown and reboot 
> including a recovery.
> It would be nice if Derby could somehow detect and finish IO operations 
> underway when thread interrupts happen before passing the exception on to the 
> application. Derby embedded is sometimes embedded in applications that use 
> Thread.interrupt to stop threads.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to