[jira] [Updated] (SLING-5602) The Discovery module does not work any more after a ResourceResolverFactory reactivation

Stefan Egli (JIRA) Sat, 19 Mar 2016 01:06:58 -0700

     [ 
https://issues.apache.org/jira/browse/SLING-5602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stefan Egli updated SLING-5602:
-------------------------------
             Assignee: Carsten Ziegeler  (was: Stefan Egli)
    Affects Version/s:     (was: Discovery Oak 1.2.6)
                       Resource Resolver 1.4.8
        Fix Version/s:     (was: Discovery Oak 1.2.8)
                       Resource Resolver 1.5.0
          Component/s:     (was: Extensions)
                       ResourceResolver

[~cziegeler], I believe this is indeed a resourceresolver issue, and indeed a 
blocker one. Here's what I see is happening with adding log statements various 
resourceresolver classes and reproducing it (I can reproduce it every 2nd time 
- right after starting a launchpad it usually works fine, but the 2nd time it 
mostly hits this problem):

* in 
[ResourceResolverFactoryActivator.activate:485|https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/ResourceResolverFactoryActivator.java#L485]
 the ResourceProviderTracker is activated - which itself starts a 
ServiceTracker 
* then some register/unregister osgi magic happens in the activate but also in 
concurrent threads I believe
** these register/unregister generate calls to providerAdded/providerRemoved 
that then propagate via checkFactoryPreconditions into {{registerFactory}}
* at some later time - unsynchronized with the background osgi thread that does 
service register/unregister (I saw one coming from the update of the config) - 
the 
[FactoryPreconditions.activate|https://github.com/apache/sling/blob/trunk/bundles/resourceresolver/src/main/java/org/apache/sling/resourceresolver/impl/FactoryPreconditions.java#L50]
 is called - which from then on allows the checkPreconditions method to return 
true
** which at some - unsynchronized/unspecified - point allows to the 
checkFactoryPreconditions to succeed and call {{registerFactory}}
** *BUT* for some reason I've seen examples where *no* new unregister/register 
of eg the OakClusterViewService is done (to update the ResourceResolverFactory) 
- and in other cases it was updated with an outdated one (ie the 
ResourceResolverFactoryActivator had another idea of Tracker/Storage than what 
it was binding to the OakClusterViewService).

I haven't yet found the right root cause - however I've seen this pattern that 
sometimes the FactoryPreconditions.checkPreconditions repeatedly returns null 
(as it is deactivated) and only at the end of the 
ResourceResolverFactoryActivator.activate succeeds - but those cases oftentimes 
then fail to correctly pass a valid ResourceResolverFactory to 
OakClusterViewService.

.. and I still believe it has nothing to do with discovery but is rather 
somewhere either in resourceresolver *or* in osgi/felix ..

.. will help digging more tomorrow, but thought I pass on this info to you as 
early as possible

> The Discovery module does not work any more after a ResourceResolverFactory 
> reactivation
> ----------------------------------------------------------------------------------------
>
>                 Key: SLING-5602
>                 URL: https://issues.apache.org/jira/browse/SLING-5602
>             Project: Sling
>          Issue Type: Bug
>          Components: ResourceResolver
>    Affects Versions: Resource Resolver 1.4.8
>            Reporter: Radu Cotescu
>            Assignee: Carsten Ziegeler
>            Priority: Blocker
>             Fix For: Resource Resolver 1.5.0
>
>
> The Discovery module does not work any more after the Resource Resolver 
> Factory is reconfigured. To reproduce this start the latest launchpad (built 
> from 
> https://github.com/apache/sling/blob/c441d5b672d1952a82a1c9fe1e6d81e86cec0018/launchpad/builder/src/main/provisioning/sling.txt)
>  and then:
> # go to 
> http://localhost:8080/system/console/configMgr/org.apache.sling.jcr.resource.internal.JcrResourceResolverFactoryImpl
> # click on save (this will trigger the component's reactivation)
> # check the error log
> {noformat}
> 14.03.2016 16:29:57.331 *ERROR* 
> [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck]
>  org.apache.sling.discovery.oak.cluster.OakClusterViewService 
> getLocalClusterView: repository exception: java.lang.Exception: Could not 
> adapt resourceResolver to session: 
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9
> java.lang.Exception: Could not adapt resourceResolver to session: 
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9
>       at 
> org.apache.sling.discovery.commons.providers.spi.base.DiscoveryLiteDescriptor.getDescriptorFrom(DiscoveryLiteDescriptor.java:41)
>       at 
> org.apache.sling.discovery.oak.cluster.OakClusterViewService.getLocalClusterView(OakClusterViewService.java:111)
>       at 
> org.apache.sling.discovery.base.commons.BaseDiscoveryService.getTopology(BaseDiscoveryService.java:77)
>       at 
> org.apache.sling.discovery.oak.OakDiscoveryService.checkForTopologyChange(OakDiscoveryService.java:657)
>       at 
> org.apache.sling.discovery.oak.pinger.OakViewChecker.discoveryLiteCheck(OakViewChecker.java:232)
>       at 
> org.apache.sling.discovery.oak.pinger.OakViewChecker.access$000(OakViewChecker.java:64)
>       at 
> org.apache.sling.discovery.oak.pinger.OakViewChecker$1.run(OakViewChecker.java:208)
>       at 
> org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.safelyRun(PeriodicBackgroundJob.java:86)
>       at 
> org.apache.sling.discovery.base.commons.PeriodicBackgroundJob.run(PeriodicBackgroundJob.java:77)
>       at java.lang.Thread.run(Thread.java:745)
> 14.03.2016 16:29:57.332 *INFO* 
> [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck]
>  org.apache.sling.discovery.base.commons.BaseDiscoveryService getTopology: 
> undefined cluster view: REPOSITORY_EXCEPTION] 
> org.apache.sling.discovery.base.commons.UndefinedClusterViewException: 
> Exception while processing descriptor: java.lang.Exception: Could not adapt 
> resourceResolver to session: 
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@499d9cc9
> 14.03.2016 16:29:57.332 *INFO* 
> [discovery.connectors.common.runner.7fd8d00a-802a-4367-a384-64024e28dbbc.discoveryLiteCheck]
>  org.apache.sling.discovery.commons.providers.base.ViewStateManagerImpl 
> enqueueForAll: sending topologyEvent TopologyEvent [type=TOPOLOGY_CHANGING, 
> oldView=DefaultTopologyView[current=false, num=1, 
> instances=7fd8d00a-802a-4367-a384-64024e28dbbc[local=true,leader=true]], 
> newView=null], to all (5) listeners
> 14.03.2016 16:29:57.332 *ERROR* [Discovery-AsyncEventSender] 
> org.apache.sling.discovery.oak.TopologyWebConsolePlugin 
> addDiscoveryLiteHistoryEntry: Exception: java.lang.Exception: Could not adapt 
> resourceResolver to session: 
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@149e86f0
> java.lang.Exception: Could not adapt resourceResolver to session: 
> org.apache.sling.resourceresolver.impl.ResourceResolverImpl@149e86f0
>       at 
> org.apache.sling.discovery.commons.providers.spi.base.DiscoveryLiteDescriptor.getDescriptorFrom(DiscoveryLiteDescriptor.java:41)
>       at 
> org.apache.sling.discovery.oak.TopologyWebConsolePlugin.updateDiscoveryLiteHistory(TopologyWebConsolePlugin.java:771)
>       at 
> org.apache.sling.discovery.oak.TopologyWebConsolePlugin.handleTopologyEvent(TopologyWebConsolePlugin.java:722)
>       at 
> org.apache.sling.discovery.commons.providers.base.AsyncTopologyEvent.trigger(AsyncTopologyEvent.java:53)
>       at 
> org.apache.sling.discovery.commons.providers.base.AsyncEventSender.run(AsyncEventSender.java:118)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> The Discovery module will not recover from this state. Furthermore it will 
> also prevent the RRF to reactivate and basically makes the instance unusable.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (SLING-5602) The Discovery module does not work any more after a ResourceResolverFactory reactivation

Reply via email to