Hi Konrad, On Sun, 2020-05-03 at 11:32 +0200, Konrad Windszus wrote: > Hi, > I just experienced this exception when starting Sling Starter 12- > SNAPSHOT: > > 03.05.2020 11:22:33.372 *INFO* [CM Event Dispatcher (Fire > ConfigurationEvent: > pid=org.apache.jackrabbit.oak.spi.security.user.action.DefaultAuthori > zableActionProvider)] > org.apache.jackrabbit.oak.security.internal.SecurityProviderRegistrat > ion Trying to unregister the SecurityProvider... > 03.05.2020 11:22:33.405 *ERROR* [Apache Sling Repository Startup > Thread #1] > org.apache.sling.jcr.oak.server.internal.OakSlingRepositoryManager > start: Uncaught Throwable trying to access Repository, calling > stopRepository() > java.lang.RuntimeException: > org.apache.jackrabbit.oak.api.CommitFailedException: OakSegment0002: > Merge interrupted > at > org.apache.jackrabbit.oak.OakInitializer.initialize(OakInitializer.ja > va:50) [org.apache.jackrabbit.oak-core:1.26.0] > at org.apache.jackrabbit.oak.Oak.initialContent(Oak.java:689) > [org.apache.jackrabbit.oak-core:1.26.0] > at > org.apache.jackrabbit.oak.Oak.createNewContentRepository(Oak.java:734 > ) [org.apache.jackrabbit.oak-core:1.26.0] > at > org.apache.jackrabbit.oak.Oak.createContentRepository(Oak.java:673) > [org.apache.jackrabbit.oak-core:1.26.0] > at > org.apache.jackrabbit.oak.jcr.Jcr.createContentRepository(Jcr.java:37 > 6) [org.apache.jackrabbit.oak-jcr:1.26.0] > at > org.apache.sling.jcr.oak.server.internal.OakSlingRepositoryManager.ac > quireRepository(OakSlingRepositoryManager.java:152) > [org.apache.sling.jcr.oak.server:1.2.4] > at > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.initializeAn > dRegisterRepositoryService(AbstractSlingRepositoryManager.java:515) > [org.apache.sling.jcr.base:3.1.0] > at > org.apache.sling.jcr.base.AbstractSlingRepositoryManager.access$300(A > bstractSlingRepositoryManager.java:92) > [org.apache.sling.jcr.base:3.1.0] > at > org.apache.sling.jcr.base.AbstractSlingRepositoryManager$4.run(Abstra > ctSlingRepositoryManager.java:496) [org.apache.sling.jcr.base:3.1.0] > Caused by: org.apache.jackrabbit.oak.api.CommitFailedException: > OakSegment0002: Merge interrupted > at > org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedu > le(LockBasedScheduler.java:284) [org.apache.jackrabbit.oak-segment- > tar:1.26.0] > at > org.apache.jackrabbit.oak.segment.SegmentNodeStore.merge(SegmentNodeS > tore.java:211) [org.apache.jackrabbit.oak-segment-tar:1.26.0] > at > org.apache.jackrabbit.oak.OakInitializer.initialize(OakInitializer.ja > va:48) [org.apache.jackrabbit.oak-core:1.26.0] > ... 8 common frames omitted > Caused by: java.lang.InterruptedException: null > at > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedIn > terruptibly(AbstractQueuedSynchronizer.java:1302) > at java.util.concurrent.Semaphore.acquire(Semaphore.java:312) > at > org.apache.jackrabbit.oak.segment.scheduler.LockBasedScheduler.schedu > le(LockBasedScheduler.java:262) [org.apache.jackrabbit.oak-segment- > tar:1.26.0] > ... 10 common frames omitted > I don't know what lead to the restart of the Oak service itself > (probably some reconfigurations) and I remember vaguely there once > was a JIRA issue about that. Couldn't find it though, so any pointers > are appreciated. > > Should I open this in a new issue or add as comment to any existing > one?
I see you correctly identified SLING-7811 [1] as place where we discuss the root cause. I think the problem surfaces at different levels: 1. We start components and the apply configurations later, this restarting them. I though this was fixed in the feature model, but apparently we get the same problem [2]. One option is to ask Oak to make a configuration required for some components [3], but IMO this is not fixing the root cause. Oak is not to blame for the way we manage components and configurations. 2. Oak does not play nicely with Thread interrupt. Unfortunately this is by design [4] 3. We stop the Oak repository using Thread.interrupt [1]. As mentioned in the Jira issue, I am working on a version that stops the repository without using interrupts, but on the other hand this will not fix it properly. If we initialise the repository without all the required services/initialisers being available, we will create and expose an inconsistent repository service, which leads to Sling being unable to start and an incorrect repository state. I am more and more convinced that we must be 100% sure that we have all the pre-requisites for the repository service. The fix in Oak [3] will help, but I would've preferred a solution at the OSGi level. Thanks, Robert [1]: https://issues.apache.org/jira/browse/SLING-7811 [2]: https://issues.apache.org/jira/browse/SLING-9118?focusedCommentId=17090938&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17090938 [3]: https://issues.apache.org/jira/browse/OAK-9047 [4]: https://jackrabbit.apache.org/oak/docs/dos_and_donts.html
