Yes, I confirm that these steps reproduce the problem for me. Can you please file an issue under https://issues.apache.org/jira/browse/SLING so we can better track this?
On Mon, 2020-02-17 at 11:24 -0500, Carlos Munoz wrote: > Thanks for the information Robert. > > To replicate the issue all I needed was a mongodb (I used a full > replica > set, see my instructions in a previous email about how to get one > going > using podman) and a single process running sling. > > The problem does happen when I do the following: > > 2. Start Sling instance A, wait for it to start > 3. Stop Sling instance A, wait for it to stop > 4. Start Sling instance B - Error > > but let me add more > > 5. Start Sling Instance A again - Success (note I didn't remove the > sling > dir) > 6. Start Sling instance B again - Success (note I didn't remove the > sling > dir) > > this means that even if Sling recreates the sling directory and fails > the > startup, next time it will succeed. Unfortunately we don't have that > luxury > in containers because the sling directory is not persisted. > > I think this is a bug, but I'll keep playing with it a bit to see if > I can > find out more. > > Carlos > > > > > > > On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <romb...@apache.org> > wrote: > > > On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote: > > > Robert I managed to replicate the issue in a local, non- > > > containerized > > > environment (!!!). > > > > > > The problem seems to be when the database is kept but the 'sling' > > > directory > > > is cleared out across restarts (as it is for us when the > > > container > > > goes > > > away). As I said before this doesn't seem to be a problem with > > > the > > > Sling 11 > > > bundles. > > > > > > The first basic solution will be to persist the 'sling' directory > > > across > > > restarts, and I was wondering if this is a bug, or as designed. > > > > I think this should work. > > > > > I also wonder if once persisted, multiple containers could share > > > this > > > directory. > > > > This directory can't be shared, as it holds runtime data related to > > Sling. For instance, a bundle that is started in instance A could > > be > > starting on instance B. > > > > There is at least one file ( sling.id ) that holds data that must > > not > > be the same between instances. > > > > So I would advise as marking the directory as container-private as > > a > > first step. > > > > Robert > > > > > Regards, > > > > > > Carlos > > > > > > > > > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <camu...@redhat.com> > > > wrote: > > > > > > > Thanks Robert (and once again I can't stress enough how > > > > grateful I > > > > am for > > > > all your help). > > > > > > > > Right now we deploy our container with the expectation that the > > > > mongo db > > > > is the only necessary state we need to keep; everything else is > > > > throwaway. > > > > This means that a totally new container connected to the > > > > mongodb > > > > should > > > > pick up the state and run the same as the first time it was > > > > fired > > > > up. Do > > > > you think this is an incorrect assumption? If so, what are > > > > other > > > > pieces of > > > > state we should be keeping for subsequent restarts? > > > > > > > > This assumption has worked well for us with the current sling > > > > 11 > > > > release, > > > > but it seems to break with the more up-to-date bundles. Perhaps > > > > running > > > > Sling in a container is just not meant to be. > > > > > > > > Regards, > > > > > > > > Carlos > > > > > > > > > > > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu < > > > > romb...@apache.org > > > > wrote: > > > > > > > > > Hi Carlos, > > > > > > > > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote: > > > > > > Thanks Bertrand. How can I run Sling with DEBUG-level logs > > > > > > for > > > > > > every > > > > > > bundle? I tried passing a few configuration arguments from > > > > > > the > > > > > > command line > > > > > > but nothing seemed to work. > > > > > > > > > > Try configuring the LogManager to debug at > > > > > > > > > > > > > > > > > https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138 > > > > > Thanks, > > > > > Robert > > > > > > > > > > > Carlos > > > > > > > > > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz < > > > > > > bdelacre...@apache.org> > > > > > > wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz < > > > > > > > camu...@redhat.com> > > > > > > > wrote: > > > > > > > > ...Is there a reason why the Jcr repository could be > > > > > > > > restarting? > > > > > > > > And what > > > > > > > > class could we start looking into to debug if this is > > > > > > > > the > > > > > > > > case?... > > > > > > > > > > > > > > It's not uncommon to see extra restarts of OSGi > > > > > > > components at > > > > > > > startup, > > > > > > > for various reasons. > > > > > > > > > > > > > > The simplest way to detect and log multiple repository > > > > > > > startups > > > > > > > might > > > > > > > be to implement a SlingRepositoryInitializer service [1] > > > > > > > that's > > > > > > > called > > > > > > > at every startup, or use the logs of an existing one like > > > > > > > the > > > > > > > JCR > > > > > > > RepositoryInitializer [2] if that has anything to process > > > > > > > in > > > > > > > your > > > > > > > system. > > > > > > > > > > > > > > -Bertrand > > > > > > > > > > > > > > [1] > > > > > > > > > https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer > > > > > > > [2] > > > > > > > > > https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98