[ https://issues.apache.org/jira/browse/SLING-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Egli updated SLING-5602: ------------------------------- Assignee: Carsten Ziegeler (was: Stefan Egli) Affects Version/s: (was: Discovery Oak 1.2.6) Resource Resolver 1.4.8 Fix Version/s: (was: Discovery Oak 1.2.8) Resource Resolver 1.5.0 Component/s: (was: Extensions) ResourceResolver [~cziegeler], I believe this is indeed a resourceresolver issue, and indeed a blocker one. Here's what I see is happening with adding log statements various resourceresolver classes and reproducing it (I can reproduce it every 2nd time - right after starting a launchpad it usually works fine, but the 2nd time it mostly hits this problem): * in [ResourceResolverFactoryActivator.activate:485|https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/ResourceResolverFactoryActivator.java#L485] the ResourceProviderTracker is activated - which itself starts a ServiceTracker * then some register/unregister osgi magic happens in the activate but also in concurrent threads I believe ** these register/unregister generate calls to providerAdded/providerRemoved that then propagate via checkFactoryPreconditions into {{registerFactory}} * at some later time - unsynchronized with the background osgi thread that does service register/unregister (I saw one coming from the update of the config) - the [FactoryPreconditions.activate|https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/FactoryPreconditions.java#L50] is called - which from then on allows the checkPreconditions method to return true ** which at some - unsynchronized/unspecified - point allows to the checkFactoryPreconditions to succeed and call {{registerFactory}} ** *BUT* for some reason I've seen examples where *no* new unregister/register of eg the OakClusterViewService is done (to update the ResourceResolverFactory) - and in other cases it was updated with an outdated one (ie the ResourceResolverFactoryActivator had another idea of Tracker/Storage than what it was binding to the OakClusterViewService). I haven't yet found the right root cause - however I've seen this pattern that sometimes the FactoryPreconditions.checkPreconditions repeatedly returns null (as it is deactivated) and only at the end of the ResourceResolverFactoryActivator.activate succeeds - but those cases oftentimes then fail to correctly pass a valid ResourceResolverFactory to OakClusterViewService. .. and I still believe it has nothing to do with discovery but is rather somewhere either in resourceresolver *or* in osgi/felix .. .. will help digging more tomorrow, but thought I pass on this info to you as early as possible > The Discovery module does not work any more after a ResourceResolverFactory > reactivation > ---------------------------------------------------------------------------------------- > > Key: SLING-5602 > URL: https://issues.apache.org/jira/browse/SLING-5602 > Project: Sling > Issue Type: Bug > Components: ResourceResolver > Affects Versions: Resource Resolver 1.4.8 > Reporter: Radu Cotescu > Assignee: Carsten Ziegeler > Priority: Blocker > Fix For: Resource Resolver 1.5.0 > > > The Discovery module does not work any more after the Resource Resolver > Factory is reconfigured. To reproduce this start the latest launchpad (built > from > https://github.com/apache/sling/blob/c441d5b672d1952a82a1c9fe1e6d81e86cec0018/launchpad/builder/src/main/provisioning/sling.txt) > and then: > # go to > http://localhost:8080/system/console/configMgr/org.apache.sling.jcr.resource.internal.JcrResourceResolverFactoryImpl > # click on save (this will trigger the component's reactivation) > # check the error log > {noformat} > 14.03.2016 16:29:57.331 *ERROR* > [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck] > org.apache.sling.discovery.oak.cluster.OakClusterViewService > getLocalClusterView: repository exception: java.lang.Exception: Could not > adapt resourceResolver to session: > org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9 > java.lang.Exception: Could not adapt resourceResolver to session: > org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9 > at > org.apache.sling.discovery.commons.providers.spi.base.DiscoveryLiteDescriptor.getDescriptorFrom(DiscoveryLiteDescriptor.java:41) > at > org.apache.sling.discovery.oak.cluster.OakClusterViewService.getLocalClusterView(OakClusterViewService.java:111) > at > org.apache.sling.discovery.base.commons.BaseDiscoveryService.getTopology(BaseDiscoveryService.java:77) > at > org.apache.sling.discovery.oak.OakDiscoveryService.checkForTopologyChange(OakDiscoveryService.java:657) > at > org.apache.sling.discovery.oak.pinger.OakViewChecker.discoveryLiteCheck(OakViewChecker.java:232) > at > org.apache.sling.discovery.oak.pinger.OakViewChecker.access$000(OakViewChecker.java:64) > at > org.apache.sling.discovery.oak.pinger.OakViewChecker$1.run(OakViewChecker.java:208) > at > org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.safelyRun(PeriodicBackgroundJob.java:86) > at > org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.run(PeriodicBackgroundJob.java:77) > at java.lang.Thread.run(Thread.java:745) > 14.03.2016 16:29:57.332 *INFO* > [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck] > org.apache.sling.discovery.base.commons.BaseDiscoveryService getTopology: > undefined cluster view: REPOSITORY_EXCEPTION] > org.apache.sling.discovery.base.commons.UndefinedClusterViewException: > Exception while processing descriptor: java.lang.Exception: Could not adapt > resourceResolver to session: > org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9 > 14.03.2016 16:29:57.332 *INFO* > [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck] > org.apache.sling.discovery.commons.providers.base.ViewStateManagerImpl > enqueueForAll: sending topologyEvent TopologyEvent [type=TOPOLOGY_CHANGING, > oldView=DefaultTopologyView[current=false, num=1, > instances=7fd8d00a-802a-4367-a384-64024e28dbbc[local=true,leader=true]], > newView=null], to all (5) listeners > 14.03.2016 16:29:57.332 *ERROR* [Discovery-AsyncEventSender] > org.apache.sling.discovery.oak.TopologyWebConsolePlugin > addDiscoveryLiteHistoryEntry: Exception: java.lang.Exception: Could not adapt > resourceResolver to session: > org.apache.sling.resourceresolver.impl.ResourceResolverImpl@149e86f0 > java.lang.Exception: Could not adapt resourceResolver to session: > org.apache.sling.resourceresolver.impl.ResourceResolverImpl@149e86f0 > at > org.apache.sling.discovery.commons.providers.spi.base.DiscoveryLiteDescriptor.getDescriptorFrom(DiscoveryLiteDescriptor.java:41) > at > org.apache.sling.discovery.oak.TopologyWebConsolePlugin.updateDiscoveryLiteHistory(TopologyWebConsolePlugin.java:771) > at > org.apache.sling.discovery.oak.TopologyWebConsolePlugin.handleTopologyEvent(TopologyWebConsolePlugin.java:722) > at > org.apache.sling.discovery.commons.providers.base.AsyncTopologyEvent.trigger(AsyncTopologyEvent.java:53) > at > org.apache.sling.discovery.commons.providers.base.AsyncEventSender.run(AsyncEventSender.java:118) > at java.lang.Thread.run(Thread.java:745) > {noformat} > The Discovery module will not recover from this state. Furthermore it will > also prevent the RRF to reactivate and basically makes the instance unusable. -- This message was sent by Atlassian JIRA (v6.3.4#6332)