On Sun, Oct 15, 2017 at 8:47 AM, Muthukumaran K <muthukumara...@ericsson.com
> wrote:

> Hi Tom,
>
>
>
> So, we should still be doing the bundle 0 stop for quarantine case ?  I
> presume so because this expectation is from Akka  – is that right ?
>
>
>

Akka doesn't know anything about karaf/bundles - the app just needs to
restart the actor system once it's quarantined. For ODL that also means
restarting all the components that use the actor system which is easiest by
just restarting the karaf container which is accomplished by restarting the
framework bundle (0). However the refactoring by that patch somehow omitted
passing '0' which means it just stops the enclosing bundle and consequently
the actor system w/o restarting anything.


> >>> If you want to push a patch to fix it, I'll merge it.
>
> Sure Tom. Will do a local quarantine test with change and push the same
>
>
>
> Regards
>
> Muthu
>
>
>
>
>
>
>
> *From:* Tom Pantelis [mailto:tompante...@gmail.com]
> *Sent:* Friday, October 13, 2017 6:40 PM
> *To:* Muthukumaran K
> *Cc:* Daniel Farrell; Jamo Luhrsen; controller-dev@lists.opendaylight.org;
> integration-...@lists.opendaylight.org
>
> *Subject:* Re: [controller-dev] Best way to gracefully shutdown Karaf in
> ODL context
>
>
>
>
>
>
>
> On Fri, Oct 13, 2017 at 8:57 AM, Tom Pantelis <tompante...@gmail.com>
> wrote:
>
>
>
>
>
> On Fri, Oct 13, 2017 at 12:59 AM, Muthukumaran K <
> muthukumara...@ericsson.com> wrote:
>
> Thanks a lot for the pointers Daniel and JamO.
>
>
>
> https://git.opendaylight.org/gerrit/gitweb?p=releng/
> builder.git;a=blob;f=jjb/packaging/stop-odl.sh;h=
> 2e3e7bf15dfbe6e59bddfbfd4ce4805fb47b2a69;hb=refs/heads/master#l27 which
> aligns with my thought too .. J
>
>
>
> Just a clarification .. had there been any situation which you could
> recollect where the karaf PID lingered abnormally long (beyond 10 – 15
> mins) during stop phase ? Have seen this once using vanilla distro  but was
> never able to repro the same for past 1 month or so even after several day
> 2 day restarts. May it was an env issue locally. So, I was a bit reserved
> in rolling the approach of stop followed by waiting till PID vanishes into
> production
>
>
>
> @Tom, @Robert,
>
>
>
> Not directly related but I will fire away …
>
>
>
> Erstwhile https://github.com/opendaylight/controller/blob/
> master/opendaylight/md-sal/sal-clustering-commons/src/
> main/java/org/opendaylight/controller/cluster/common/
> actor/QuarantinedMonitorActor.java used to restart the entire container
> and now on master Quarantined state just restarts the ActorSystem – is my
> understanding right ?
>
>
>
> It restarts the enclosing bundle:
>
>
>
> return QuarantinedMonitorActor.props(() -> {
>
>             // restart the entire karaf container
>
>             LOG.warn("Restarting karaf container");
>
>             System.setProperty("karaf.restart.jvm", "true");
>
>             bundleContext.getBundle().stop();
>
>         });
>
>
>
> It used to restart bundle 0. Not sure why that was changed....
>
>
>
> Looks like this was inadvertently changed by https://git.opendaylight.org/
> gerrit/#/c/62451/ - it used to be
>
>      bundleContext.getBundle(0).stop();
>
>
>
> If you want to push a patch to fix it, I'll merge it.
>
>
>
>
>
>
>
> Regards
>
> Muthu
>
>
>
>
>
>
>
> *From:* Daniel Farrell [mailto:dfarr...@redhat.com]
> *Sent:* Friday, October 13, 2017 6:19 AM
> *To:* Jamo Luhrsen; Muthukumaran K; controller-dev@lists.opendaylight.org;
> integration-...@lists.opendaylight.org
> *Subject:* Re: [controller-dev] Best way to gracefully shutdown Karaf in
> ODL context
>
>
>
> Hey Muthu,
>
>
>
> Yes, I think you should take a look at the systemd configuration we ship
> in ODL's packages. As far as I know it does a good job of
> starting/stopping/restarting ODL's service.
>
>
>
> https://git.opendaylight.org/gerrit/gitweb?p=integration/
> packaging.git;a=blob;f=packages/rpm/unitfiles/opendaylight.service;h=
> ac436592d2880047986b856c7dd6810665ba0d3e;hb=refs/heads/master
>
>
>
> Here's a Nitrogen RPM that contains that systemd config:
>
>
>
> http://cbs.centos.org/repos/nfv7-opendaylight-70-release/
> x86_64/os/Packages/opendaylight-7.0.0-1.el7.noarch.rpm
>
>
>
> This test job shows examples of `sudo systemctl [start, stop, status]`
> working:
>
>
>
> https://jenkins.opendaylight.org/releng/job/packaging-test-rpm-master
>
>
>
> The logic for that job is here:
>
>
>
> https://git.opendaylight.org/gerrit/gitweb?p=releng/
> builder.git;a=blob;f=jjb/packaging/packaging.yaml;h=
> e4de235ca543506063b7fb57c3d257f0b983abe3;hb=refs/heads/master#l346
>
>
>
> That systemd config is also exercised in tests for puppet-opendaylight,
> ansible-opendaylight, OPNFV Apex and other OPNFV installers.
>
>
>
> It seems like you've put some good thought into this, so if you have any
> suggestions for things we can do better please let us know. :)
>
>
>
> Daniel
>
>
>
> On Thu, Oct 12, 2017 at 11:47 AM Jamo Luhrsen <jluhr...@gmail.com> wrote:
>
> +Daniel and Integration-dev,
>
> Daniel,
>
> does our rpm package and the systemd work you did for it answer any of
> Muthu's
> questions below? I'm assuming it *IS* the answer, but you will know better.
>
> Thanks,
> JamO
>
> On 10/12/2017 04:56 AM, Muthukumaran K wrote:
> > Hi,
> >
> > * *
> >
> > *Context* : Figuring out the best possible way to gracefully shutdown
> Karaf process using standard Karaf commands.
> >
> > This would be required because framework-level shutdown-sequence in
> Karaf would give opportunity framework to properly
> > execute bundle lifecycle listeners. What I mean is – abrupt kill can
> potentially prevent lifecycle listeners from being
> > properly executed and may also impact any inflight transactions which
> may be in various stages of replication and/or commit
> > phases. This can in turn lead to troubles during recovery / restart
> phase.
> >
> >
> >
> > So, I thought of middle-ground where
> >
> > 1)      We execute karaf stop followed by
> >
> > 2)      Periodic check  if the last PID indeed terminates
> >
> >
> >
> > Doing a straight kill -9 could lead to rare heisenbugs during wherein
> recovery could suffer since there may not be room for
> > lifecycle listeners to execute (unless Karaf handles it as unified
> shutdownhook and execute same path as that of stop or any
> > graceful shutdown methods)
> >
> >
> >
> > Have anybody tried any better methods without side-effects ?
> >
> >
> >
> >
> >
> > *Option was tried and observation is as follows *
> >
> > Using Karaf stop followed by Karaf status command to check if the
> process has come to a graceful termination. But, it appears
> > that though ‘status’ command reports Karaf instance as ‘Not Running’,
> the PID still lingers for 2 to 3 mins roughly in ODL
> > context. I am biased to think that there are indeed some lifecycle
> listeners executing … During this ‘PID lingering’ phase,
> > the thread-dump hints the System Bundle Shutdown is waiting for the BP
> container to shutdown the components (probably
> > executing the lifecycle listeners at application and platform levels)
> >
> >
> >
> > "System Bundle Shutdown" #1582 daemon prio=5 os_prio=0
> tid=0x00007fb05003d800 nid=0xe68 waiting on condition [0x00007faf77678000]
> >
> >    java.lang.Thread.State: TIMED_WAITING (parking)
> >
> >                 at sun.misc.Unsafe.park(Native Method)
> >
> >                 - parking to wait for  <0x00000000e9064250> (a
> com.google.common.util.concurrent.AbstractFuture$Sync)
> >
> >                 at java.util.concurrent.locks.LockSupport.parkNanos(
> LockSupport.java:215)
> >
> >                 at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.
> doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> >
> >                 at
> > java.util.concurrent.locks.AbstractQueuedSynchronizer.
> tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> >
> >                 at com.google.common.util.concurrent.AbstractFuture$
> Sync.get(AbstractFuture.java:268)
> >
> >                 at com.google.common.util.concurrent.AbstractFuture.get(
> AbstractFuture.java:96)
> >
> >                 at org.opendaylight.openflowplugin.openflow.md.core.
> MDController.stop(MDController.java:358)
> >
> >                 at
> > org.opendaylight.openflowplugin.openflow.md.core.sal.
> OpenflowPluginProvider.close(OpenflowPluginProvider.java:121)
> >
> >                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
> Method)
> >
> >                 at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> >
> >                 at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> >
> >                 at java.lang.reflect.Method.invoke(Method.java:498)
> >
> >                 at org.apache.aries.blueprint.
> utils.ReflectionUtils.invoke(ReflectionUtils.java:299)
> >
> >                 at org.apache.aries.blueprint.
> container.BeanRecipe.invoke(BeanRecipe.java:980)
> >
> >                 at org.apache.aries.blueprint.
> container.BeanRecipe.destroy(BeanRecipe.java:887)
> >
> >                 at org.apache.aries.blueprint.
> container.BlueprintRepository.destroy(BlueprintRepository.java:329)
> >
> >                 at org.apache.aries.blueprint.container.
> BlueprintContainerImpl.destroyComponents(BlueprintContainerImpl.java:765)
> >
> >                 at org.apache.aries.blueprint.container.
> BlueprintContainerImpl.tidyupComponents(BlueprintContainerImpl.java:964)
> >
> >                 at org.apache.aries.blueprint.container.
> BlueprintContainerImpl.destroy(BlueprintContainerImpl.java:909)
> >
> >                 at org.apache.aries.blueprint.
> container.BlueprintExtender$3.run(BlueprintExtender.java:325)
> >
> >                 at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> >
> >                 at java.util.concurrent.FutureTask.run(FutureTask.
> java:266)
> >
> >                 at org.apache.aries.blueprint.
> container.BlueprintExtender.destroyContainer(BlueprintExtender.java:346)
> >
> >                 at org.apache.aries.blueprint.
> container.BlueprintExtender.access$400(BlueprintExtender.java:68)
> >
> >                 at
> > org.apache.aries.blueprint.container.BlueprintExtender$
> BlueprintContainerServiceImpl.destroyContainer(BlueprintExtender.java:624)
> >
> >                 at
> > org.opendaylight.controller.blueprint.BlueprintBundleTracker.
> shutdownAllContainers(BlueprintBundleTracker.java:251)
> >
> >                 at org.opendaylight.controller.blueprint.
> BlueprintBundleTracker.bundleChanged(BlueprintBundleTracker.java:150)
> >
> >                 at org.eclipse.osgi.framework.int
> ernal.core.BundleContextImpl.dispatchEvent(BundleContextImpl.java:847)
> >
> >                 at org.eclipse.osgi.framework.eventmgr.EventManager.
> dispatchEvent(EventManager.java:230)
> >
> >                 at org.eclipse.osgi.framework.eventmgr.ListenerQueue.
> dispatchEventSynchronous(ListenerQueue.java:148)
> >
> >                 at org.eclipse.osgi.framework.internal.core.Framework.
> publishBundleEventPrivileged(Framework.java:1568)
> >
> >                 at org.eclipse.osgi.framework.internal.core.Framework.
> publishBundleEvent(Framework.java:1504)
> >
> >                 at org.eclipse.osgi.framework.internal.core.Framework.
> publishBundleEvent(Framework.java:1499)
> >
> >                 at org.eclipse.osgi.framework.int
> ernal.core.Framework.shutdown(Framework.java:681)
> >
> >                 - locked <0x000000008060b4d0> (a
> org.eclipse.osgi.framework.internal.core.Framework)
> >
> >                 at org.eclipse.osgi.framework.int
> ernal.core.Framework.close(Framework.java:600)
> >
> >                 - locked <0x000000008060b4d0> (a
> org.eclipse.osgi.framework.internal.core.Framework)
> >
> >                 at org.eclipse.osgi.framework.internal.core.
> InternalSystemBundle$1.run(InternalSystemBundle.java:261)
> >
> >                 at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> > "Framework Active Thread" #12 prio=5 os_prio=0 tid=0x00007fb0dc4bd000
> nid=0x52a waiting for monitor entry [0x00007fb0c14b0000]
> >
> >    java.lang.Thread.State: BLOCKED (on object monitor)
> >
> >                 at java.lang.Object.wait(Native Method)
> >
> >                 at org.eclipse.osgi.framework.int
> ernal.core.Framework.run(Framework.java:1862)
> >
> >                 - locked <0x000000008060b4d0> (a
> org.eclipse.osgi.framework.internal.core.Framework)
> >
> >                 at java.lang.Thread.run(Thread.java:745)
> >
> >
> >
> > "main" #1 prio=5 os_prio=0 tid=0x00007fb0dc00b800 nid=0x514 in
> Object.wait() [0x00007fb0e5134000]
> >
> >    java.lang.Thread.State: WAITING (on object monitor)
> >
> >                 at java.lang.Object.wait(Native Method)
> >
> >                 - waiting on <0x000000008060b4d0> (a
> org.eclipse.osgi.framework.internal.core.Framework)
> >
> >                 at org.eclipse.osgi.framework.internal.core.Framework.
> waitForStop(Framework.java:1884)
> >
> >                 - locked <0x000000008060b4d0> (a
> org.eclipse.osgi.framework.internal.core.Framework)
> >
> >                 at org.eclipse.osgi.framework.int
> ernal.core.EquinoxLauncher.waitForStop(EquinoxLauncher.java:118)
> >
> >                 at org.eclipse.osgi.launch.Equinox.waitForStop(Equinox.
> java:182)
> >
> >                 at org.apache.karaf.main.Main.
> awaitShutdown(Main.java:487)
> >
> >                 at org.apache.karaf.main.Main.main(Main.java:177)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Regards
> >
> > Muthu
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > controller-dev mailing list
> > controller-dev@lists.opendaylight.org
> > https://lists.opendaylight.org/mailman/listinfo/controller-dev
> >
>
>
> _______________________________________________
> controller-dev mailing list
> controller-dev@lists.opendaylight.org
> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>
>
>
>
>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to