In the context of https://issues.apache.org/jira/browse/SLING-5457 and https://issues.apache.org/jira/browse/SLING-6176 I am currently investigating the current retry behaviour for InstallTasks. In some places the exception is caught and something like "Retrying later" is logged (e.g. in https://github.com/apache/sling/blob/trunk/installer/core/src/main/java/org/apache/sling/installer/core/impl/tasks/BundleInstallTask.java#L83) while the resource state is not modified, that leads to the same task being executed again with the next cycle.
In other places it is just giving up (e.g. for bundle updates in https://github.com/apache/sling/blob/trunk/installer/core/src/main/java/org/apache/sling/installer/core/impl/tasks/BundleUpdateTask.java#L135 where it just sets the resource to "ignore"), see also https://issues.apache.org/jira/browse/SLING-5457. Currently the number of retries is not counted, so in case of retries it is repeated with each cycle of the installer. I would propose the following changes here: 1) In case an installer task throws a (runtime) exception (https://github.com/apache/sling/blob/trunk/installer/core/src/main/java/org/apache/sling/installer/core/impl/OsgiInstallerImpl.java#L860), there is no retry at all. We just assume that all the exceptions being thrown from the InstallerTask are non-recoverable. The state is always set to ignore and the error is persisted through the OsginstallerImpl. 2) In case of recoverable errors the installer task should not throw an exception but rather call a new method "InstallationContext.markForRetry()". This sets a boolean flag to true. The OSGi installer will internally count the number of retries and log them. Once the maximum number of retries is reached (configurable through OSGi property) it would behave like for 1). WDYT? Especially for the BundleUpdateTasks it makes sense to retry here as well (https://issues.apache.org/jira/browse/SLING-5457). Konrad