[ https://issues.apache.org/jira/browse/SLING-10372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342618#comment-17342618 ]
Konrad Windszus edited comment on SLING-10372 at 5/11/21, 2:59 PM: ------------------------------------------------------------------- Indeed this seems to be a race condition. The following seems to happen in your case in the following order: 1. {{PackageTransformer}} is started (once SlingRepository is up) 2. {{PackageTransfomer.createTask()}} is called from OSGi Installer, returned {{InstalledPackageTask}} is put into queue 3. {{SlingRepository}} is stopped (after stopping component {{PackageTransformer}}) 4. {{OsgiInstallerImpl.executeTasks}} is calling {{execute}} on the {{InstallPackageTask}} reference (which has been returned in 3) I don't yet have a good idea how to solve that yet. IMHO the OSGiInstallerImpl should never call tasks returned from deactivated {{InstallTaskFactory}} s. [~cziegeler] Do you have a suggestion how to prevent executing tasks of deactivated {{InstallTaskFactories}}? was (Author: kwin): Indeed this seems a race condition. The following seems to happen in your case 1. {{PackageTransformer}} is started (one SlingRepository is up) 2. {{PackageTransfomer.createTask()}} is called from OSGi Installer, returned {{InstalledPackageTask}} is put into queue 3. {{SlingRepository}} is stopped (after stopping component {{PackageTransformer}}) 4. {{OsgiInstallerImpl.executeTasks}} is calling {{execute}} on the {{InstallPackageTask}} reference (which has been returned in 3) I don't yet have a good idea how to solve that yet. IMHO the OSGiInstallerImpl should never call tasks returned from deactivated {{InstallTaskFactory}} s. [~cziegeler] Do you have a suggestion how to prevent executing tasks of deactivated {{InstallTaskFactories}}? > NPE during package installation during startup > ---------------------------------------------- > > Key: SLING-10372 > URL: https://issues.apache.org/jira/browse/SLING-10372 > Project: Sling > Issue Type: Bug > Components: Installer > Affects Versions: Starter 12 > Environment: Sling-Starter 12-SNAPSHOT (commit 0e6a8e41) with JDK 11 > on MacOS > Reporter: Hans-Peter Stoerr > Priority: Minor > Fix For: Installer Core 3.11.6 > > Attachments: error.log > > > (As requested in SLING-10362) When starting a Sling Starter 12 with a feature > archive containing a couple of packages and having a couple of packages > installed with the Sling [fileinstaller > provider|https://sling.apache.org/documentation/bundles/file-installer-provider.html], > I often get a NPE, stacktrace is appended. This stops the installation of > the package when this happens. It isn't about that particular package, though > - if I take out other packages from the automatic installation or put it into > the fileinstall directory it later, it installs happily. > It's a rather difficult to give detailed steps to reproduce that, but I have > guess what's happening. I do have a particular setting where it always > happens on my machine, but that might be sensitive to the speed of my machine > and whatnot. Basically, I'm starting the feature launcher with a FAR containg > several packages of ours, and also give the arguments > -Dsling.fileinstall.dir=launcher/fileinstall -Dfelix.startlevel.bundle=30 > to the launcher, having placed several packages in the fileinstall directory. > I guess the NPE happens only when enough packages are placed there, and it > happens only on the first startup (i.e., there was no launcher directory yet). > I had a look around with the debugger: it seems the SlingRepository was > stopped but not yet started again for a restart just before the > PackageTransformer is trying to process the package, probably due to some > kind of configuration change. It tries to access the repository via a > reference of the OakSlingRepository whose manager already has been stopped so > that getRepository() returns null. Hence the NPE. Probably the > org.apache.sling.installer.factory.packages.impl.PackageTransformer should > somehow handle such temporary failures that don't have anything to do with > the package? Another way to solve seems to be to set the start level of the > org.apache.sling.installer.factory.packages bundle to 21. Probably because > when reaching the start level 20 so much happens at once, so that transition > is not a good time to install packages. > Here is the stacktrace that marks the error. I'll attach a logfile for some > more context. BTW: Interesting might be also the exceptions "Can't create > child on a synthetic root" in the log file, which I receive regularily during > startup, but that's probably not related to this problem, as it also happens > when things work properly. > {code} > 11.05.2021 13:27:49.462 *ERROR* [OsgiInstallerImpl] > org.apache.sling.installer.factory.packages.impl.PackageTransformer Error > while processing install content package task* of > TaskResource(url=fileinstallff43091e0ee8ac91416c79636bdce5f4:/Users/hps/dev/composum/composum-launch/feature/composumstarter/target/launcher/fileinstall/99/composum-si*te-app-package-1.0.0-SNAPSHOT.zip, > entity=content-package:tenants/ist:composum-site-app-package, state=INSTALL, > attributes=[org.apache.sling.installer.api.tasks.ResourceTr*ansformer=:27:23:1243:, > package-id=tenants/ist:composum-site-app-package:1.0.0-SNAPSHOT, > Bundle-Version=1.0.0.SNAPSHOT], digest=1620718306467) due to null, no retry. > java.lang.NullPointerException: null > at > org.apache.sling.jcr.oak.server.internal.OakSlingRepository$2.run(OakSlingRepository.java:99) > [org.apache.sling.jcr.oak.server:1.2.10] > at > org.apache.sling.jcr.oak.server.internal.OakSlingRepository$2.run(OakSlingRepository.java:96) > [org.apache.sling.jcr.oak.server:1.2.10] > at java.base/java.security.AccessController.doPrivileged(Native > Method) > at > java.base/javax.security.auth.Subject.doAsPrivileged(Subject.java:550) > at > org.apache.sling.jcr.oak.server.internal.OakSlingRepository.createServiceSession(OakSlingRepository.java:96) > [org.apache.sling.jcr.oak.server:1.2.10] > at > org.apache.sling.jcr.base.AbstractSlingRepository2.createServiceSession(AbstractSlingRepository2.java:166) > [org.apache.sling.jcr.base:3.1.6] > at > org.apache.sling.jcr.base.AbstractSlingRepository2.loginService(AbstractSlingRepository2.java:383) > [org.apache.sling.jcr.base:3.1.6] > at > org.apache.sling.installer.factory.packages.impl.PackageTransformer$AbstractPackageInstallTask.execute(PackageTransformer.java:263) > [org.apache.sling.installer. factory.packages:1.0.4] > at > org.apache.sling.installer.core.impl.OsgiInstallerImpl.doExecuteTasks(OsgiInstallerImpl.java:918) > [org.apache.sling.installer.core:3.11.4] > at > org.apache.sling.installer.core.impl.OsgiInstallerImpl.executeTasks(OsgiInstallerImpl.java:755) > [org.apache.sling.installer.core:3.11.4] > at > org.apache.sling.installer.core.impl.OsgiInstallerImpl.run(OsgiInstallerImpl.java:304) > [org.apache.sling.installer.core:3.11.4] > at java.base/java.lang.Thread.run(Thread.java:834) > {code} > I'm not sure whether this is a a Minor or Major - it breaks things in the > startup, but I've found a way to modify the starter to avoid it, see above. -- This message was sent by Atlassian Jira (v8.3.4#803005)