[
https://issues.apache.org/jira/browse/FELIX-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14563177#comment-14563177
]
Rob Ryan commented on FELIX-4883:
---------------------------------
It is proving difficult to provoke the failure through an isolated test case.
But code inspection and testing with variations of the code have convinced me
the situation goes like this:
ServiceRegistration.setProperties checks up front for a valid service object,
but it lets go of its lock on the ServiceRegistrationImpl before it calls the
ServiceListeners. That means that the ServiceRegistrationImpl delivered to a
ServiceListener can become invalid before the serviceChanged method of a
listener is called. This is coupled with an inappropriate assumption in
org.apache.felix.scr.impl.manager.ServiceTracker<S, T>. In
org.apache.felix.scr.impl.manager.ServiceTracker.Tracked.serviceChanged(ServiceEvent)
:
case ServiceEvent.REGISTERED :
case ServiceEvent.MODIFIED :
track(reference, event);
This means that delivery of a ServiceEvent.MODIFIED can have the same effect as
a ServiceEvent.REGISTERED, and can re-track a service registration that is
actually already invalid!
Bottom line I think is that ServiceRegistration *and* ServiceReference uses
need to be aware that any time a lock is not held on the ServiceRegistration
the object can become invalid. At which point the bundle becomes unavailable
(getBundle returns null).
While I haven't unraveled all the implications for the ServiceTracker
implementation yet, it appears some additional synchronization is necessary
within the ServiceTracker to ensure that service registration changes from
separate threads don't result in an invalid state of the system.
> ServiceComponentRuntime.getComponentConfigurationDTOs NullPointerException
> --------------------------------------------------------------------------
>
> Key: FELIX-4883
> URL: https://issues.apache.org/jira/browse/FELIX-4883
> Project: Felix
> Issue Type: Bug
> Components: Declarative Services (SCR)
> Environment: Linux, Sling, Adobe CQ, org.apache.felix.scr version
> 1.8.3-R1658944
> Reporter: Rob Ryan
> Assignee: David Bosschaert
> Priority: Minor
> Fix For: scr-2.0.0
>
> Attachments: scrtest.zip
>
>
> In our test automation we install a large set of bundles after our also large
> 'main' app starts up. This causes significant churn as bundles and components
> are stopped and potentially new versions are started. Unfortunately the coded
> involved is not open source, so I cannot deliver the full data required to
> reproduce the failure described here.
> What I can share is that after all this churn of bundles and components being
> stopped and started the ScrComponentRuntime service starts to fail with a
> NullPointerException in getComponentConfigurationDTOs. This was initially
> noticed as an NPE being reported when visiting the felix console at
> /system/console/components.
> The stack at the point of failure is:
> java.lang.NullPointerException
> at
> org.apache.felix.scr.impl.runtime.ServiceComponentRuntimeImpl.serviceReferenceToDTO(ServiceComponentRuntimeImpl.java:205)
> at
> org.apache.felix.scr.impl.runtime.ServiceComponentRuntimeImpl.satisfiedRefManagersToDTO(ServiceComponentRuntimeImpl.java:169)
> at
> org.apache.felix.scr.impl.runtime.ServiceComponentRuntimeImpl.managerToConfiguration(ServiceComponentRuntimeImpl.java:145)
> at
> org.apache.felix.scr.impl.runtime.ServiceComponentRuntimeImpl.getComponentConfigurationDTOs(ServiceComponentRuntimeImpl.java:119)
> at com.robr.incqtest.test.ScrTest.test(ScrTest.java:37)
> ...
> The NPE occurs because a
> org.apache.felix.framework.ServiceRegistrationImpl.ServiceReferenceImpl being
> processed in
> org.apache.felix.scr.impl.runtime.ServiceComponentRuntimeImpl.serviceReferenceToDTO(org.osgi.framework.ServiceReference<?>)
> line: 205
> has a m_svcObj of null. So even though the bundle is actually available in
> the object the getBundle() method returns null.
> [~cziegeler] [~bosschaert] I can investigate further to ideally narrow this
> down further, but any pointers would be much appreciated.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)