In the context of https://issues.apache.org/jira/browse/SLING-5457 and 
https://issues.apache.org/jira/browse/SLING-6176 I am currently investigating 
the current retry behaviour for InstallTasks. In some places the exception is 
caught and something like "Retrying later" is logged (e.g. in 
https://github.com/apache/sling/blob/trunk/installer/core/src/main/java/org/apache/sling/installer/core/impl/tasks/BundleInstallTask.java#L83)
 while the resource state is not modified, that leads to the same task being 
executed again with the next cycle.

In other places it is just giving up (e.g. for bundle updates in 
https://github.com/apache/sling/blob/trunk/installer/core/src/main/java/org/apache/sling/installer/core/impl/tasks/BundleUpdateTask.java#L135
 where it just sets the resource to "ignore"), see also 
https://issues.apache.org/jira/browse/SLING-5457.

Currently the number of retries is not counted, so in case of retries it is 
repeated with each cycle of the installer.

I would propose the following changes here:
1) In case an installer task throws a (runtime) exception 
(https://github.com/apache/sling/blob/trunk/installer/core/src/main/java/org/apache/sling/installer/core/impl/OsgiInstallerImpl.java#L860),
 there is no retry at all. We just assume that all the exceptions being thrown 
from the InstallerTask are non-recoverable. The state is always set to ignore 
and the error is persisted through the OsginstallerImpl.
2) In case of recoverable errors the installer task should not throw an 
exception but rather call a new method "InstallationContext.markForRetry()". 
This sets a boolean flag to true. The OSGi installer will internally count the 
number of retries and log them. Once the maximum number of retries is reached 
(configurable through OSGi property) it would behave like for 1).

WDYT?
Especially for the BundleUpdateTasks it makes sense to retry here as well 
(https://issues.apache.org/jira/browse/SLING-5457).

Konrad

Reply via email to