[ 
https://issues.apache.org/jira/browse/SLING-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carsten Ziegeler closed SLING-4477.
-----------------------------------

> JcrInstaller should not call Thread.interrupt()
> -----------------------------------------------
>
>                 Key: SLING-4477
>                 URL: https://issues.apache.org/jira/browse/SLING-4477
>             Project: Sling
>          Issue Type: Improvement
>          Components: Installer, JCR
>            Reporter: Thomas Mueller
>            Assignee: Carsten Ziegeler
>             Fix For: JCR Installer 3.1.14
>
>
> There JcrInstaller calls Thread.interrupt() where it's dangerous and not 
> necessary. Thread.interrupt is dangerous because it closes files (when using 
> the FileChannel API), including Lucene files, and the Oak persistent cache 
> files. All further I/O operations with that file, including I/O operations on 
> other threads, will then fail (see ClosedByInterruptException for details). 
> OAK-2571 protects against closing persistent cache files, by reopening the 
> files. But it results in slower performance and ugly log messages. 
> Thread.interrupt is also dangerous because it does not work as expected if 
> some code catches InterruptedException and does not re-throw it. See 
> http://stackoverflow.com/questions/2020992/is-thread-interrupt-evil
> Thread.interrupt is not necessary in most cases. Instead, a simple "volatile 
> boolean" flag is sufficient, and much safer.
> The JCR installer uses a boolean flag (active), but also Thread.interrupt, to 
> stop the sleep period.
> Just before the problem occurs, I see the following messages in the log file:
> {noformat}
> 05.03.2015 15:22:26.642 *INFO* [CM Event Dispatcher (Fire ConfigurationEvent: 
> pid=org.apache.sling.installer.provider.jcr.impl.JcrInstaller)] 
> org.apache.sling.installer.provider.jcr.impl.JcrInstaller Deactivating Apache 
> Sling JCR Installer
> {noformat}
> The very last message is from the 
> org.apache.sling.installer.provider.jcr.impl.JcrInstaller, who calls:
> {noformat}
> backgroundThread.interrupt();
> {noformat}
> One possible solution is: in JcrInstaller, instead of:
> {noformat}
>         try {
>             Thread.sleep(RUN_LOOP_DELAY_MSEC);
>         } catch (final InterruptedException ignore) {
>             // ignore
>         }
> {noformat}
> use:
> {noformat}
>         synchronized (this) {
>             if (active) {
>                 try {
>                     wait(RUN_LOOP_DELAY_MSEC);
>                 } catch (final InterruptedException ignore) {
>                     // ignore
>                 }
>             }
>         } 
> {noformat}
> and instead of:
> {noformat}
>         backgroundThread.active = false;
>         logger.debug("Waiting for " + backgroundThread.getName() + " Thread 
> to end...");
>         backgroundThread.interrupt();
> {noformat}
> use:
> {noformat}
>         synchronized (backgroundThread) {
>             backgroundThread.active = false;
>             backgroundThread.notifyAll();
>         } 
>         logger.debug("Waiting for " + backgroundThread.getName() + " Thread 
> to end...");
> {noformat}
> That's also better than what we have now, because right now, _any_ code 
> within "runOneCycle" (including library and Oak code) that is doing "catch 
> (InterruptedException x) { }" will let the 
> "Thread.sleep(RUN_LOOP_DELAY_MSEC)" sleep one second too long.
> And people catch and ignore InterruptedException a _lot_. Using wait and 
> notifyAll does not suffer from this, and has no risk of trouble.
> An alternative is to simply remove backgroundThread.interrupt(), and change 
> the loop to:
> {noformat}
>         for (int i = 0; i < 100 && active; i++) {
>             try {
>                 Thread.sleep(RUN_LOOP_DELAY_MSEC / 100);
>             } catch (final InterruptedException ignore) {
>                 // ignore
>             }
>         }
> {noformat}
> This will let it sleep 10 ms too long at most. It requires that 
> RUN_LOOP_DELAY_MSEC is >= 100, which is ugly.
> In this case, the field "active" needs to be volatile. But is needed even for 
> the current code (it's better to fix that in all cases).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to