[
https://issues.apache.org/jira/browse/SLING-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Egli updated SLING-5602:
-------------------------------
Assignee: Carsten Ziegeler (was: Stefan Egli)
Affects Version/s: (was: Discovery Oak 1.2.6)
Resource Resolver 1.4.8
Fix Version/s: (was: Discovery Oak 1.2.8)
Resource Resolver 1.5.0
Component/s: (was: Extensions)
ResourceResolver
[~cziegeler], I believe this is indeed a resourceresolver issue, and indeed a
blocker one. Here's what I see is happening with adding log statements various
resourceresolver classes and reproducing it (I can reproduce it every 2nd time
- right after starting a launchpad it usually works fine, but the 2nd time it
mostly hits this problem):
* in
[ResourceResolverFactoryActivator.activate:485|https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/ResourceResolverFactoryActivator.java#L485]
the ResourceProviderTracker is activated - which itself starts a
ServiceTracker
* then some register/unregister osgi magic happens in the activate but also in
concurrent threads I believe
** these register/unregister generate calls to providerAdded/providerRemoved
that then propagate via checkFactoryPreconditions into {{registerFactory}}
* at some later time - unsynchronized with the background osgi thread that does
service register/unregister (I saw one coming from the update of the config) -
the
[FactoryPreconditions.activate|https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/FactoryPreconditions.java#L50]
is called - which from then on allows the checkPreconditions method to return
true
** which at some - unsynchronized/unspecified - point allows to the
checkFactoryPreconditions to succeed and call {{registerFactory}}
** *BUT* for some reason I've seen examples where *no* new unregister/register
of eg the OakClusterViewService is done (to update the ResourceResolverFactory)
- and in other cases it was updated with an outdated one (ie the
ResourceResolverFactoryActivator had another idea of Tracker/Storage than what
it was binding to the OakClusterViewService).
I haven't yet found the right root cause - however I've seen this pattern that
sometimes the FactoryPreconditions.checkPreconditions repeatedly returns null
(as it is deactivated) and only at the end of the
ResourceResolverFactoryActivator.activate succeeds - but those cases oftentimes
then fail to correctly pass a valid ResourceResolverFactory to
OakClusterViewService.
.. and I still believe it has nothing to do with discovery but is rather
somewhere either in resourceresolver *or* in osgi/felix ..
.. will help digging more tomorrow, but thought I pass on this info to you as
early as possible
> The Discovery module does not work any more after a ResourceResolverFactory
> reactivation
> ----------------------------------------------------------------------------------------
>
> Key: SLING-5602
> URL: https://issues.apache.org/jira/browse/SLING-5602
> Project: Sling
> Issue Type: Bug
> Components: ResourceResolver
> Affects Versions: Resource Resolver 1.4.8
> Reporter: Radu Cotescu
> Assignee: Carsten Ziegeler
> Priority: Blocker
> Fix For: Resource Resolver 1.5.0
>
>
> The Discovery module does not work any more after the Resource Resolver
> Factory is reconfigured. To reproduce this start the latest launchpad (built
> from
> https://github.com/apache/sling/blob/c441d5b672d1952a82a1c9fe1e6d81e86cec0018/launchpad/builder/src/main/provisioning/sling.txt)
> and then:
> # go to
> http://localhost:8080/system/console/configMgr/org.apache.sling.jcr.resource.internal.JcrResourceResolverFactoryImpl
> # click on save (this will trigger the component's reactivation)
> # check the error log
> {noformat}
> 14.03.2016 16:29:57.331 *ERROR*
> [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck]
> org.apache.sling.discovery.oak.cluster.OakClusterViewService
> getLocalClusterView: repository exception: java.lang.Exception: Could not
> adapt resourceResolver to session:
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9
> java.lang.Exception: Could not adapt resourceResolver to session:
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9
> at
> org.apache.sling.discovery.commons.providers.spi.base.DiscoveryLiteDescriptor.getDescriptorFrom(DiscoveryLiteDescriptor.java:41)
> at
> org.apache.sling.discovery.oak.cluster.OakClusterViewService.getLocalClusterView(OakClusterViewService.java:111)
> at
> org.apache.sling.discovery.base.commons.BaseDiscoveryService.getTopology(BaseDiscoveryService.java:77)
> at
> org.apache.sling.discovery.oak.OakDiscoveryService.checkForTopologyChange(OakDiscoveryService.java:657)
> at
> org.apache.sling.discovery.oak.pinger.OakViewChecker.discoveryLiteCheck(OakViewChecker.java:232)
> at
> org.apache.sling.discovery.oak.pinger.OakViewChecker.access$000(OakViewChecker.java:64)
> at
> org.apache.sling.discovery.oak.pinger.OakViewChecker$1.run(OakViewChecker.java:208)
> at
> org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.safelyRun(PeriodicBackgroundJob.java:86)
> at
> org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.run(PeriodicBackgroundJob.java:77)
> at java.lang.Thread.run(Thread.java:745)
> 14.03.2016 16:29:57.332 *INFO*
> [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck]
> org.apache.sling.discovery.base.commons.BaseDiscoveryService getTopology:
> undefined cluster view: REPOSITORY_EXCEPTION]
> org.apache.sling.discovery.base.commons.UndefinedClusterViewException:
> Exception while processing descriptor: java.lang.Exception: Could not adapt
> resourceResolver to session:
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9
> 14.03.2016 16:29:57.332 *INFO*
> [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck]
> org.apache.sling.discovery.commons.providers.base.ViewStateManagerImpl
> enqueueForAll: sending topologyEvent TopologyEvent [type=TOPOLOGY_CHANGING,
> oldView=DefaultTopologyView[current=false, num=1,
> instances=7fd8d00a-802a-4367-a384-64024e28dbbc[local=true,leader=true]],
> newView=null], to all (5) listeners
> 14.03.2016 16:29:57.332 *ERROR* [Discovery-AsyncEventSender]
> org.apache.sling.discovery.oak.TopologyWebConsolePlugin
> addDiscoveryLiteHistoryEntry: Exception: java.lang.Exception: Could not adapt
> resourceResolver to session:
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@149e86f0
> java.lang.Exception: Could not adapt resourceResolver to session:
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@149e86f0
> at
> org.apache.sling.discovery.commons.providers.spi.base.DiscoveryLiteDescriptor.getDescriptorFrom(DiscoveryLiteDescriptor.java:41)
> at
> org.apache.sling.discovery.oak.TopologyWebConsolePlugin.updateDiscoveryLiteHistory(TopologyWebConsolePlugin.java:771)
> at
> org.apache.sling.discovery.oak.TopologyWebConsolePlugin.handleTopologyEvent(TopologyWebConsolePlugin.java:722)
> at
> org.apache.sling.discovery.commons.providers.base.AsyncTopologyEvent.trigger(AsyncTopologyEvent.java:53)
> at
> org.apache.sling.discovery.commons.providers.base.AsyncEventSender.run(AsyncEventSender.java:118)
> at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The Discovery module will not recover from this state. Furthermore it will
> also prevent the RRF to reactivate and basically makes the instance unusable.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)