On Fri, Oct 13, 2017 at 12:59 AM, Muthukumaran K < muthukumara...@ericsson.com> wrote:
> Thanks a lot for the pointers Daniel and JamO. > > > > https://git.opendaylight.org/gerrit/gitweb?p=releng/ > builder.git;a=blob;f=jjb/packaging/stop-odl.sh;h= > 2e3e7bf15dfbe6e59bddfbfd4ce4805fb47b2a69;hb=refs/heads/master#l27 which > aligns with my thought too .. J > > > > Just a clarification .. had there been any situation which you could > recollect where the karaf PID lingered abnormally long (beyond 10 – 15 > mins) during stop phase ? Have seen this once using vanilla distro but was > never able to repro the same for past 1 month or so even after several day > 2 day restarts. May it was an env issue locally. So, I was a bit reserved > in rolling the approach of stop followed by waiting till PID vanishes into > production > > > > @Tom, @Robert, > > > > Not directly related but I will fire away … > > > > Erstwhile https://github.com/opendaylight/controller/blob/ > master/opendaylight/md-sal/sal-clustering-commons/src/ > main/java/org/opendaylight/controller/cluster/common/ > actor/QuarantinedMonitorActor.java used to restart the entire container > and now on master Quarantined state just restarts the ActorSystem – is my > understanding right ? > It restarts the enclosing bundle: return QuarantinedMonitorActor.props(() -> { // restart the entire karaf container LOG.warn("Restarting karaf container"); System.setProperty("karaf.restart.jvm", "true"); bundleContext.getBundle().stop(); }); It used to restart bundle 0. Not sure why that was changed.... > > Regards > > Muthu > > > > > > > > *From:* Daniel Farrell [mailto:dfarr...@redhat.com] > *Sent:* Friday, October 13, 2017 6:19 AM > *To:* Jamo Luhrsen; Muthukumaran K; controller-dev@lists.opendaylight.org; > integration-...@lists.opendaylight.org > *Subject:* Re: [controller-dev] Best way to gracefully shutdown Karaf in > ODL context > > > > Hey Muthu, > > > > Yes, I think you should take a look at the systemd configuration we ship > in ODL's packages. As far as I know it does a good job of > starting/stopping/restarting ODL's service. > > > > https://git.opendaylight.org/gerrit/gitweb?p=integration/ > packaging.git;a=blob;f=packages/rpm/unitfiles/opendaylight.service;h= > ac436592d2880047986b856c7dd6810665ba0d3e;hb=refs/heads/master > > > > Here's a Nitrogen RPM that contains that systemd config: > > > > http://cbs.centos.org/repos/nfv7-opendaylight-70-release/ > x86_64/os/Packages/opendaylight-7.0.0-1.el7.noarch.rpm > > > > This test job shows examples of `sudo systemctl [start, stop, status]` > working: > > > > https://jenkins.opendaylight.org/releng/job/packaging-test-rpm-master > > > > The logic for that job is here: > > > > https://git.opendaylight.org/gerrit/gitweb?p=releng/ > builder.git;a=blob;f=jjb/packaging/packaging.yaml;h= > e4de235ca543506063b7fb57c3d257f0b983abe3;hb=refs/heads/master#l346 > > > > That systemd config is also exercised in tests for puppet-opendaylight, > ansible-opendaylight, OPNFV Apex and other OPNFV installers. > > > > It seems like you've put some good thought into this, so if you have any > suggestions for things we can do better please let us know. :) > > > > Daniel > > > > On Thu, Oct 12, 2017 at 11:47 AM Jamo Luhrsen <jluhr...@gmail.com> wrote: > > +Daniel and Integration-dev, > > Daniel, > > does our rpm package and the systemd work you did for it answer any of > Muthu's > questions below? I'm assuming it *IS* the answer, but you will know better. > > Thanks, > JamO > > On 10/12/2017 04:56 AM, Muthukumaran K wrote: > > Hi, > > > > * * > > > > *Context* : Figuring out the best possible way to gracefully shutdown > Karaf process using standard Karaf commands. > > > > This would be required because framework-level shutdown-sequence in > Karaf would give opportunity framework to properly > > execute bundle lifecycle listeners. What I mean is – abrupt kill can > potentially prevent lifecycle listeners from being > > properly executed and may also impact any inflight transactions which > may be in various stages of replication and/or commit > > phases. This can in turn lead to troubles during recovery / restart > phase. > > > > > > > > So, I thought of middle-ground where > > > > 1) We execute karaf stop followed by > > > > 2) Periodic check if the last PID indeed terminates > > > > > > > > Doing a straight kill -9 could lead to rare heisenbugs during wherein > recovery could suffer since there may not be room for > > lifecycle listeners to execute (unless Karaf handles it as unified > shutdownhook and execute same path as that of stop or any > > graceful shutdown methods) > > > > > > > > Have anybody tried any better methods without side-effects ? > > > > > > > > > > > > *Option was tried and observation is as follows * > > > > Using Karaf stop followed by Karaf status command to check if the > process has come to a graceful termination. But, it appears > > that though ‘status’ command reports Karaf instance as ‘Not Running’, > the PID still lingers for 2 to 3 mins roughly in ODL > > context. I am biased to think that there are indeed some lifecycle > listeners executing … During this ‘PID lingering’ phase, > > the thread-dump hints the System Bundle Shutdown is waiting for the BP > container to shutdown the components (probably > > executing the lifecycle listeners at application and platform levels) > > > > > > > > "System Bundle Shutdown" #1582 daemon prio=5 os_prio=0 > tid=0x00007fb05003d800 nid=0xe68 waiting on condition [0x00007faf77678000] > > > > java.lang.Thread.State: TIMED_WAITING (parking) > > > > at sun.misc.Unsafe.park(Native Method) > > > > - parking to wait for <0x00000000e9064250> (a > com.google.common.util.concurrent.AbstractFuture$Sync) > > > > at java.util.concurrent.locks.LockSupport.parkNanos( > LockSupport.java:215) > > > > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer. > doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037) > > > > at > > java.util.concurrent.locks.AbstractQueuedSynchronizer. > tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328) > > > > at com.google.common.util.concurrent.AbstractFuture$ > Sync.get(AbstractFuture.java:268) > > > > at com.google.common.util.concurrent.AbstractFuture.get( > AbstractFuture.java:96) > > > > at org.opendaylight.openflowplugin.openflow.md.core. > MDController.stop(MDController.java:358) > > > > at > > org.opendaylight.openflowplugin.openflow.md.core.sal. > OpenflowPluginProvider.close(OpenflowPluginProvider.java:121) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > > > > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > > > > at java.lang.reflect.Method.invoke(Method.java:498) > > > > at org.apache.aries.blueprint. > utils.ReflectionUtils.invoke(ReflectionUtils.java:299) > > > > at org.apache.aries.blueprint. > container.BeanRecipe.invoke(BeanRecipe.java:980) > > > > at org.apache.aries.blueprint. > container.BeanRecipe.destroy(BeanRecipe.java:887) > > > > at org.apache.aries.blueprint. > container.BlueprintRepository.destroy(BlueprintRepository.java:329) > > > > at org.apache.aries.blueprint.container. > BlueprintContainerImpl.destroyComponents(BlueprintContainerImpl.java:765) > > > > at org.apache.aries.blueprint.container. > BlueprintContainerImpl.tidyupComponents(BlueprintContainerImpl.java:964) > > > > at org.apache.aries.blueprint.container. > BlueprintContainerImpl.destroy(BlueprintContainerImpl.java:909) > > > > at org.apache.aries.blueprint. > container.BlueprintExtender$3.run(BlueprintExtender.java:325) > > > > at java.util.concurrent.Executors$RunnableAdapter. > call(Executors.java:511) > > > > at java.util.concurrent.FutureTask.run(FutureTask. > java:266) > > > > at org.apache.aries.blueprint. > container.BlueprintExtender.destroyContainer(BlueprintExtender.java:346) > > > > at org.apache.aries.blueprint. > container.BlueprintExtender.access$400(BlueprintExtender.java:68) > > > > at > > org.apache.aries.blueprint.container.BlueprintExtender$ > BlueprintContainerServiceImpl.destroyContainer(BlueprintExtender.java:624) > > > > at > > org.opendaylight.controller.blueprint.BlueprintBundleTracker. > shutdownAllContainers(BlueprintBundleTracker.java:251) > > > > at org.opendaylight.controller.blueprint. > BlueprintBundleTracker.bundleChanged(BlueprintBundleTracker.java:150) > > > > at org.eclipse.osgi.framework.internal.core. > BundleContextImpl.dispatchEvent(BundleContextImpl.java:847) > > > > at org.eclipse.osgi.framework.eventmgr.EventManager. > dispatchEvent(EventManager.java:230) > > > > at org.eclipse.osgi.framework.eventmgr.ListenerQueue. > dispatchEventSynchronous(ListenerQueue.java:148) > > > > at org.eclipse.osgi.framework.internal.core.Framework. > publishBundleEventPrivileged(Framework.java:1568) > > > > at org.eclipse.osgi.framework.internal.core.Framework. > publishBundleEvent(Framework.java:1504) > > > > at org.eclipse.osgi.framework.internal.core.Framework. > publishBundleEvent(Framework.java:1499) > > > > at org.eclipse.osgi.framework.internal.core.Framework. > shutdown(Framework.java:681) > > > > - locked <0x000000008060b4d0> (a > org.eclipse.osgi.framework.internal.core.Framework) > > > > at org.eclipse.osgi.framework. > internal.core.Framework.close(Framework.java:600) > > > > - locked <0x000000008060b4d0> (a > org.eclipse.osgi.framework.internal.core.Framework) > > > > at org.eclipse.osgi.framework.internal.core. > InternalSystemBundle$1.run(InternalSystemBundle.java:261) > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > "Framework Active Thread" #12 prio=5 os_prio=0 tid=0x00007fb0dc4bd000 > nid=0x52a waiting for monitor entry [0x00007fb0c14b0000] > > > > java.lang.Thread.State: BLOCKED (on object monitor) > > > > at java.lang.Object.wait(Native Method) > > > > at org.eclipse.osgi.framework. > internal.core.Framework.run(Framework.java:1862) > > > > - locked <0x000000008060b4d0> (a > org.eclipse.osgi.framework.internal.core.Framework) > > > > at java.lang.Thread.run(Thread.java:745) > > > > > > > > "main" #1 prio=5 os_prio=0 tid=0x00007fb0dc00b800 nid=0x514 in > Object.wait() [0x00007fb0e5134000] > > > > java.lang.Thread.State: WAITING (on object monitor) > > > > at java.lang.Object.wait(Native Method) > > > > - waiting on <0x000000008060b4d0> (a > org.eclipse.osgi.framework.internal.core.Framework) > > > > at org.eclipse.osgi.framework.internal.core.Framework. > waitForStop(Framework.java:1884) > > > > - locked <0x000000008060b4d0> (a > org.eclipse.osgi.framework.internal.core.Framework) > > > > at org.eclipse.osgi.framework. > internal.core.EquinoxLauncher.waitForStop(EquinoxLauncher.java:118) > > > > at org.eclipse.osgi.launch.Equinox.waitForStop(Equinox. > java:182) > > > > at org.apache.karaf.main.Main. > awaitShutdown(Main.java:487) > > > > at org.apache.karaf.main.Main.main(Main.java:177) > > > > > > > > > > > > > > > > > > > > > > > > Regards > > > > Muthu > > > > > > > > > > > > > > > > _______________________________________________ > > controller-dev mailing list > > controller-dev@lists.opendaylight.org > > https://lists.opendaylight.org/mailman/listinfo/controller-dev > > > > > _______________________________________________ > controller-dev mailing list > controller-dev@lists.opendaylight.org > https://lists.opendaylight.org/mailman/listinfo/controller-dev > >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev