On Fri, Oct 13, 2017 at 8:57 AM, Tom Pantelis <tompante...@gmail.com> wrote:
> > > On Fri, Oct 13, 2017 at 12:59 AM, Muthukumaran K < > muthukumara...@ericsson.com> wrote: > >> Thanks a lot for the pointers Daniel and JamO. >> >> >> >> https://git.opendaylight.org/gerrit/gitweb?p=releng/builder. >> git;a=blob;f=jjb/packaging/stop-odl.sh;h=2e3e7bf15dfbe6e5 >> 9bddfbfd4ce4805fb47b2a69;hb=refs/heads/master#l27 which aligns with my >> thought too .. J >> >> >> >> Just a clarification .. had there been any situation which you could >> recollect where the karaf PID lingered abnormally long (beyond 10 – 15 >> mins) during stop phase ? Have seen this once using vanilla distro but was >> never able to repro the same for past 1 month or so even after several day >> 2 day restarts. May it was an env issue locally. So, I was a bit reserved >> in rolling the approach of stop followed by waiting till PID vanishes into >> production >> >> >> >> @Tom, @Robert, >> >> >> >> Not directly related but I will fire away … >> >> >> >> Erstwhile https://github.com/opendaylight/controller/blob/master/ >> opendaylight/md-sal/sal-clustering-commons/src/main/ >> java/org/opendaylight/controller/cluster/common/actor/ >> QuarantinedMonitorActor.java used to restart the entire container and >> now on master Quarantined state just restarts the ActorSystem – is my >> understanding right ? >> > > It restarts the enclosing bundle: > > return QuarantinedMonitorActor.props(() -> { > // restart the entire karaf container > LOG.warn("Restarting karaf container"); > System.setProperty("karaf.restart.jvm", "true"); > bundleContext.getBundle().stop(); > }); > > It used to restart bundle 0. Not sure why that was changed.... > Looks like this was inadvertently changed by https://git.opendaylight.org/gerrit/#/c/62451/ - it used to be bundleContext.getBundle(0).stop(); If you want to push a patch to fix it, I'll merge it. > > >> >> Regards >> >> Muthu >> >> >> >> >> >> >> >> *From:* Daniel Farrell [mailto:dfarr...@redhat.com] >> *Sent:* Friday, October 13, 2017 6:19 AM >> *To:* Jamo Luhrsen; Muthukumaran K; controller-dev@lists.opendaylight.org; >> integration-...@lists.opendaylight.org >> *Subject:* Re: [controller-dev] Best way to gracefully shutdown Karaf in >> ODL context >> >> >> >> Hey Muthu, >> >> >> >> Yes, I think you should take a look at the systemd configuration we ship >> in ODL's packages. As far as I know it does a good job of >> starting/stopping/restarting ODL's service. >> >> >> >> https://git.opendaylight.org/gerrit/gitweb?p=integration/pac >> kaging.git;a=blob;f=packages/rpm/unitfiles/opendaylight. >> service;h=ac436592d2880047986b856c7dd6810665ba0d3e;hb=refs/heads/master >> >> >> >> Here's a Nitrogen RPM that contains that systemd config: >> >> >> >> http://cbs.centos.org/repos/nfv7-opendaylight-70-release/x86 >> _64/os/Packages/opendaylight-7.0.0-1.el7.noarch.rpm >> >> >> >> This test job shows examples of `sudo systemctl [start, stop, status]` >> working: >> >> >> >> https://jenkins.opendaylight.org/releng/job/packaging-test-rpm-master >> >> >> >> The logic for that job is here: >> >> >> >> https://git.opendaylight.org/gerrit/gitweb?p=releng/builder. >> git;a=blob;f=jjb/packaging/packaging.yaml;h=e4de235ca5435 >> 06063b7fb57c3d257f0b983abe3;hb=refs/heads/master#l346 >> >> >> >> That systemd config is also exercised in tests for puppet-opendaylight, >> ansible-opendaylight, OPNFV Apex and other OPNFV installers. >> >> >> >> It seems like you've put some good thought into this, so if you have any >> suggestions for things we can do better please let us know. :) >> >> >> >> Daniel >> >> >> >> On Thu, Oct 12, 2017 at 11:47 AM Jamo Luhrsen <jluhr...@gmail.com> wrote: >> >> +Daniel and Integration-dev, >> >> Daniel, >> >> does our rpm package and the systemd work you did for it answer any of >> Muthu's >> questions below? I'm assuming it *IS* the answer, but you will know >> better. >> >> Thanks, >> JamO >> >> On 10/12/2017 04:56 AM, Muthukumaran K wrote: >> > Hi, >> > >> > * * >> > >> > *Context* : Figuring out the best possible way to gracefully shutdown >> Karaf process using standard Karaf commands. >> > >> > This would be required because framework-level shutdown-sequence in >> Karaf would give opportunity framework to properly >> > execute bundle lifecycle listeners. What I mean is – abrupt kill can >> potentially prevent lifecycle listeners from being >> > properly executed and may also impact any inflight transactions which >> may be in various stages of replication and/or commit >> > phases. This can in turn lead to troubles during recovery / restart >> phase. >> > >> > >> > >> > So, I thought of middle-ground where >> > >> > 1) We execute karaf stop followed by >> > >> > 2) Periodic check if the last PID indeed terminates >> > >> > >> > >> > Doing a straight kill -9 could lead to rare heisenbugs during wherein >> recovery could suffer since there may not be room for >> > lifecycle listeners to execute (unless Karaf handles it as unified >> shutdownhook and execute same path as that of stop or any >> > graceful shutdown methods) >> > >> > >> > >> > Have anybody tried any better methods without side-effects ? >> > >> > >> > >> > >> > >> > *Option was tried and observation is as follows * >> > >> > Using Karaf stop followed by Karaf status command to check if the >> process has come to a graceful termination. But, it appears >> > that though ‘status’ command reports Karaf instance as ‘Not Running’, >> the PID still lingers for 2 to 3 mins roughly in ODL >> > context. I am biased to think that there are indeed some lifecycle >> listeners executing … During this ‘PID lingering’ phase, >> > the thread-dump hints the System Bundle Shutdown is waiting for the BP >> container to shutdown the components (probably >> > executing the lifecycle listeners at application and platform levels) >> > >> > >> > >> > "System Bundle Shutdown" #1582 daemon prio=5 os_prio=0 >> tid=0x00007fb05003d800 nid=0xe68 waiting on condition [0x00007faf77678000] >> > >> > java.lang.Thread.State: TIMED_WAITING (parking) >> > >> > at sun.misc.Unsafe.park(Native Method) >> > >> > - parking to wait for <0x00000000e9064250> (a >> com.google.common.util.concurrent.AbstractFuture$Sync) >> > >> > at java.util.concurrent.locks.Loc >> kSupport.parkNanos(LockSupport.java:215) >> > >> > at >> > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcqu >> ireSharedNanos(AbstractQueuedSynchronizer.java:1037) >> > >> > at >> > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcq >> uireSharedNanos(AbstractQueuedSynchronizer.java:1328) >> > >> > at com.google.common.util.concurr >> ent.AbstractFuture$Sync.get(AbstractFuture.java:268) >> > >> > at com.google.common.util.concurr >> ent.AbstractFuture.get(AbstractFuture.java:96) >> > >> > at org.opendaylight.openflowplugin.openflow.md >> .core.MDController.stop(MDController.java:358) >> > >> > at >> > org.opendaylight.openflowplugin.openflow.md.core.sal.Openflo >> wPluginProvider.close(OpenflowPluginProvider.java:121) >> > >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native >> Method) >> > >> > at sun.reflect.NativeMethodAccess >> orImpl.invoke(NativeMethodAccessorImpl.java:62) >> > >> > at sun.reflect.DelegatingMethodAc >> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> > >> > at java.lang.reflect.Method.invoke(Method.java:498) >> > >> > at org.apache.aries.blueprint.uti >> ls.ReflectionUtils.invoke(ReflectionUtils.java:299) >> > >> > at org.apache.aries.blueprint.con >> tainer.BeanRecipe.invoke(BeanRecipe.java:980) >> > >> > at org.apache.aries.blueprint.con >> tainer.BeanRecipe.destroy(BeanRecipe.java:887) >> > >> > at org.apache.aries.blueprint.con >> tainer.BlueprintRepository.destroy(BlueprintRepository.java:329) >> > >> > at org.apache.aries.blueprint.con >> tainer.BlueprintContainerImpl.destroyComponents(BlueprintCon >> tainerImpl.java:765) >> > >> > at org.apache.aries.blueprint.con >> tainer.BlueprintContainerImpl.tidyupComponents(BlueprintCont >> ainerImpl.java:964) >> > >> > at org.apache.aries.blueprint.con >> tainer.BlueprintContainerImpl.destroy(BlueprintContainerImpl.java:909) >> > >> > at org.apache.aries.blueprint.con >> tainer.BlueprintExtender$3.run(BlueprintExtender.java:325) >> > >> > at java.util.concurrent.Executors$RunnableAdapter.call( >> Executors.java:511) >> > >> > at java.util.concurrent.FutureTas >> k.run(FutureTask.java:266) >> > >> > at org.apache.aries.blueprint.con >> tainer.BlueprintExtender.destroyContainer(BlueprintExtender.java:346) >> > >> > at org.apache.aries.blueprint.con >> tainer.BlueprintExtender.access$400(BlueprintExtender.java:68) >> > >> > at >> > org.apache.aries.blueprint.container.BlueprintExtender$Bluep >> rintContainerServiceImpl.destroyContainer(BlueprintExtender.java:624) >> > >> > at >> > org.opendaylight.controller.blueprint.BlueprintBundleTracker >> .shutdownAllContainers(BlueprintBundleTracker.java:251) >> > >> > at org.opendaylight.controller.bl >> ueprint.BlueprintBundleTracker.bundleChanged(BlueprintBundle >> Tracker.java:150) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.BundleContextImpl.dispatchEvent(BundleContextImpl.java:847) >> > >> > at org.eclipse.osgi.framework.eve >> ntmgr.EventManager.dispatchEvent(EventManager.java:230) >> > >> > at org.eclipse.osgi.framework.eve >> ntmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:148) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.Framework.publishBundleEventPrivileged(Framework.java:1568) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.Framework.publishBundleEvent(Framework.java:1504) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.Framework.publishBundleEvent(Framework.java:1499) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.Framework.shutdown(Framework.java:681) >> > >> > - locked <0x000000008060b4d0> (a >> org.eclipse.osgi.framework.internal.core.Framework) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.Framework.close(Framework.java:600) >> > >> > - locked <0x000000008060b4d0> (a >> org.eclipse.osgi.framework.internal.core.Framework) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.InternalSystemBundle$1.run(InternalSystemBundle.java:261) >> > >> > at java.lang.Thread.run(Thread.java:745) >> > >> > >> > >> > "Framework Active Thread" #12 prio=5 os_prio=0 tid=0x00007fb0dc4bd000 >> nid=0x52a waiting for monitor entry [0x00007fb0c14b0000] >> > >> > java.lang.Thread.State: BLOCKED (on object monitor) >> > >> > at java.lang.Object.wait(Native Method) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.Framework.run(Framework.java:1862) >> > >> > - locked <0x000000008060b4d0> (a >> org.eclipse.osgi.framework.internal.core.Framework) >> > >> > at java.lang.Thread.run(Thread.java:745) >> > >> > >> > >> > "main" #1 prio=5 os_prio=0 tid=0x00007fb0dc00b800 nid=0x514 in >> Object.wait() [0x00007fb0e5134000] >> > >> > java.lang.Thread.State: WAITING (on object monitor) >> > >> > at java.lang.Object.wait(Native Method) >> > >> > - waiting on <0x000000008060b4d0> (a >> org.eclipse.osgi.framework.internal.core.Framework) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.Framework.waitForStop(Framework.java:1884) >> > >> > - locked <0x000000008060b4d0> (a >> org.eclipse.osgi.framework.internal.core.Framework) >> > >> > at org.eclipse.osgi.framework.int >> ernal.core.EquinoxLauncher.waitForStop(EquinoxLauncher.java:118) >> > >> > at org.eclipse.osgi.launch.Equino >> x.waitForStop(Equinox.java:182) >> > >> > at org.apache.karaf.main.Main.awa >> itShutdown(Main.java:487) >> > >> > at org.apache.karaf.main.Main.main(Main.java:177) >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > Regards >> > >> > Muthu >> > >> > >> > >> > >> > >> > >> > >> > _______________________________________________ >> > controller-dev mailing list >> > controller-dev@lists.opendaylight.org >> > https://lists.opendaylight.org/mailman/listinfo/controller-dev >> > >> >> >> _______________________________________________ >> controller-dev mailing list >> controller-dev@lists.opendaylight.org >> https://lists.opendaylight.org/mailman/listinfo/controller-dev >> >> >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev