[ 
https://issues.apache.org/jira/browse/SLING-5457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15858099#comment-15858099
 ] 

Karl Pauls commented on SLING-5457:
-----------------------------------

I think I can see what is going on namely, while the installer is active there 
is a start level change going on at the same time and the two are racing for 
the same bundle.

That makes it so that sometimes the interaction is:
Bundle: ACTIVE
Installer: stop bundle
Bundle: STOPPED
Startlevel: start bundle
Bundle: STARTING
Installer: update bundle
Exception: bundle STARTING
Bundle: ACTIVE

In reality, this can be generalised to any two management agents racing for the 
same bundle in this sequence. The bundle update isn’t trying to wait for a 
bundle that is in the STOPPING or STARTING state. Instead, as mentioned in the 
issue, an exception is thrown and I think that is actually a bug in Felix 
(technically, its more a missing feature but that is besides the point) as 
newer versions of the spec mandate that on an update the framework should wait 
for bundles that are STOPPING or STARTING - hence, the real fix for this issue 
is to implement that behaviour in the Felix framework.

However, additionally, I think that this specific interaction with the start 
level change and the installer is somewhat unfortunate. It probably would be 
worthwhile for the installer to try to only be active when there is no start 
level change going on (I remember that there was some other bug report on the 
sling dev list recently that I suspect might be related to this interaction).

Implementing a retry as proposed here should be ok as a short term bandaid. 
Ultimatly, I’d say this should be addressed by an improved Felix framework and 
possibly a better handling of start level changes by the installer.

I created FELIX-5528 to try to address this in the framework (as well as trying 
to improve the error message as well as part of FELIX-5138).

> OsgiInstaller should retry to start bundles on failures
> -------------------------------------------------------
>
>                 Key: SLING-5457
>                 URL: https://issues.apache.org/jira/browse/SLING-5457
>             Project: Sling
>          Issue Type: Bug
>          Components: Installer
>    Affects Versions: Installer Core 3.6.4
>            Reporter: Jörg Hoh
>
> The OsgiInstaller doesn't update a bundle properly, if there's an exception 
> from the framework.
> I have this exception:
> {code}
> 11.12.2015 14:09:36.753 *INFO* [FelixStartLevel] my.custom.bundle BundleEvent 
> RESOLVED
> 11.12.2015 14:09:36.753 *INFO* [FelixStartLevel] my.custom.bundle BundleEvent 
> STARTING
> 11.12.2015 14:09:36.754 INFO [OsgiInstallerImpl] 
> org.apache.sling.installer.core.impl.tasks.BundleUpdateTask Removing failing 
> update task - unable to retry: BundleUpdateTask: 
> TaskResource(url=jcrinstall:/apps/myapp/install/my.custom.bundle-1.5.6-SNAPSHOT.jar,
>  entity=bundle:my.custom.bundle, state=INSTALL, 
> attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:28:84:15:,
>  Bundle-SymbolicName=my.custom.bundle, Bundle-Version=1.5.6-SNAPSHOT], 
> digest=1449838063263)
> org.osgi.framework.BundleException: Bundle my.custom.bundle [252] cannot be 
> update, since it is either starting or stopping.
> at org.apache.felix.framework.Felix.updateBundle(Felix.java:2311)
> at org.apache.felix.framework.BundleImpl.update(BundleImpl.java:995)
> at 
> org.apache.sling.installer.core.impl.tasks.BundleUpdateTask.execute(BundleUpdateTask.java:92)
> at 
> org.apache.sling.installer.core.impl.OsgiInstallerImpl.doExecuteTasks(OsgiInstallerImpl.java:847)
> at 
> org.apache.sling.installer.core.impl.OsgiInstallerImpl.executeTasks(OsgiInstallerImpl.java:689)
> at 
> org.apache.sling.installer.core.impl.OsgiInstallerImpl.run(OsgiInstallerImpl.java:265)
> at java.lang.Thread.run(Thread.java:767)
> {code}
> I don't know for what reason the Felix.updateBundle() failed (see also 
> FELIX-5138 to get some more information in this case), but from my point of 
> view there should be a dedicated error handling just for the 
> {code}BundleImpl.update{code} call. Does it make sense to retry the 
> installation at a later point in time (maybe 3 times at max)?
> (I got this exception when I deployed a large number of bundles through the 
> JCR installer. It happens only once in a while, but it's an annoying task to 
> fix it manually.)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to