Thanks for the information Robert.

To replicate the issue all I needed was a mongodb (I used a full replica
set, see my instructions in a previous email about how to get one going
using podman) and a single process running sling.

The problem does happen when I do the following:

2. Start Sling instance A, wait for it to start
3. Stop Sling instance A, wait for it to stop
4. Start Sling instance B - Error

but let me add more

5. Start Sling Instance A again - Success (note I didn't remove the sling
dir)
6. Start Sling instance B again - Success (note I didn't remove the sling
dir)

this means that even if Sling recreates the sling directory and fails the
startup, next time it will succeed. Unfortunately we don't have that luxury
in containers because the sling directory is not persisted.

I think this is a bug, but I'll keep playing with it a bit to see if I can
find out more.

Carlos






On Mon, Feb 17, 2020 at 5:23 AM Robert Munteanu <romb...@apache.org> wrote:

> On Fri, 2020-02-14 at 15:41 -0500, Carlos Munoz wrote:
> > Robert I managed to replicate the issue in a local, non-containerized
> > environment (!!!).
> >
> > The problem seems to be when the database is kept but the 'sling'
> > directory
> > is cleared out across restarts (as it is for us when the container
> > goes
> > away). As I said before this doesn't seem to be a problem with the
> > Sling 11
> > bundles.
> >
> > The first basic solution will be to persist the 'sling' directory
> > across
> > restarts, and I was wondering if this is a bug, or as designed.
>
> I think this should work.
>
> >
> > I also wonder if once persisted, multiple containers could share this
> > directory.
>
> This directory can't be shared, as it holds runtime data related to
> Sling. For instance, a bundle that is started in instance A could be
> starting on instance B.
>
> There is at least one file ( sling.id ) that holds data that must not
> be the same between instances.
>
> So I would advise as marking the directory as container-private as a
> first step.
>
> Robert
>
> >
> > Regards,
> >
> > Carlos
> >
> >
> > On Fri, Feb 14, 2020 at 3:17 PM Carlos Munoz <camu...@redhat.com>
> > wrote:
> >
> > > Thanks Robert (and once again I can't stress enough how grateful I
> > > am for
> > > all your help).
> > >
> > > Right now we deploy our container with the expectation that the
> > > mongo db
> > > is the only necessary state we need to keep; everything else is
> > > throwaway.
> > > This means that a totally new container connected to the mongodb
> > > should
> > > pick up the state and run the same as the first time it was fired
> > > up. Do
> > > you think this is an incorrect assumption? If so, what are other
> > > pieces of
> > > state we should be keeping for subsequent restarts?
> > >
> > > This assumption has worked well for us with the current sling 11
> > > release,
> > > but it seems to break with the more up-to-date bundles. Perhaps
> > > running
> > > Sling in a container is just not meant to be.
> > >
> > > Regards,
> > >
> > > Carlos
> > >
> > >
> > > On Fri, Feb 14, 2020 at 2:21 PM Robert Munteanu <romb...@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi Carlos,
> > > >
> > > > On Fri, 2020-02-14 at 11:50 -0500, Carlos Munoz wrote:
> > > > > Thanks Bertrand. How can I run Sling with DEBUG-level logs for
> > > > > every
> > > > > bundle? I tried passing a few configuration arguments from the
> > > > > command line
> > > > > but nothing seemed to work.
> > > >
> > > > Try configuring the LogManager to debug at
> > > >
> > > >
> > > >
> https://github.com/apache/sling-org-apache-sling-starter/blob/8ba34e28fbea2feb4c61767dde510aa94d86fa0a/src/main/provisioning/sling.txt#L138
> > > >
> > > > Thanks,
> > > > Robert
> > > >
> > > > > Carlos
> > > > >
> > > > > On Fri, Feb 14, 2020 at 4:32 AM Bertrand Delacretaz <
> > > > > bdelacre...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > On Thu, Feb 13, 2020 at 8:47 PM Carlos Munoz <
> > > > > > camu...@redhat.com>
> > > > > > wrote:
> > > > > > > ...Is there a reason why the Jcr repository could be
> > > > > > > restarting?
> > > > > > > And what
> > > > > > > class could we start looking into to debug if this is the
> > > > > > > case?...
> > > > > >
> > > > > > It's not uncommon to see extra restarts of OSGi components at
> > > > > > startup,
> > > > > > for various reasons.
> > > > > >
> > > > > > The simplest way to detect and log multiple repository
> > > > > > startups
> > > > > > might
> > > > > > be to implement a SlingRepositoryInitializer service [1]
> > > > > > that's
> > > > > > called
> > > > > > at every startup, or use the logs of an existing one like the
> > > > > > JCR
> > > > > > RepositoryInitializer [2] if that has anything to process in
> > > > > > your
> > > > > > system.
> > > > > >
> > > > > > -Bertrand
> > > > > >
> > > > > > [1]
> > > > > >
> > > >
> https://sling.apache.org/documentation/bundles/repository-initialization.html#slingrepositoryinitializer
> > > > > > [2]
> > > > > >
> > > >
> https://github.com/apache/sling-org-apache-sling-jcr-repoinit/blob/41dfe606f99ca71baee8d9054d3ec6e9b896b12e/src/main/java/org/apache/sling/jcr/repoinit/impl/RepositoryInitializer.java#L98
> > > > > >
>
>

Reply via email to