On Fri, Oct 13, 2017 at 8:57 AM, Tom Pantelis <tompante...@gmail.com> wrote:

>
>
> On Fri, Oct 13, 2017 at 12:59 AM, Muthukumaran K <
> muthukumara...@ericsson.com> wrote:
>
>> Thanks a lot for the pointers Daniel and JamO.
>>
>>
>>
>> https://git.opendaylight.org/gerrit/gitweb?p=releng/builder.
>> git;a=blob;f=jjb/packaging/stop-odl.sh;h=2e3e7bf15dfbe6e5
>> 9bddfbfd4ce4805fb47b2a69;hb=refs/heads/master#l27 which aligns with my
>> thought too .. J
>>
>>
>>
>> Just a clarification .. had there been any situation which you could
>> recollect where the karaf PID lingered abnormally long (beyond 10 – 15
>> mins) during stop phase ? Have seen this once using vanilla distro  but was
>> never able to repro the same for past 1 month or so even after several day
>> 2 day restarts. May it was an env issue locally. So, I was a bit reserved
>> in rolling the approach of stop followed by waiting till PID vanishes into
>> production
>>
>>
>>
>> @Tom, @Robert,
>>
>>
>>
>> Not directly related but I will fire away …
>>
>>
>>
>> Erstwhile https://github.com/opendaylight/controller/blob/master/
>> opendaylight/md-sal/sal-clustering-commons/src/main/
>> java/org/opendaylight/controller/cluster/common/actor/
>> QuarantinedMonitorActor.java used to restart the entire container and
>> now on master Quarantined state just restarts the ActorSystem – is my
>> understanding right ?
>>
>
> It restarts the enclosing bundle:
>
> return QuarantinedMonitorActor.props(() -> {
>             // restart the entire karaf container
>             LOG.warn("Restarting karaf container");
>             System.setProperty("karaf.restart.jvm", "true");
>             bundleContext.getBundle().stop();
>         });
>
> It used to restart bundle 0. Not sure why that was changed....
>

Looks like this was inadvertently changed by
https://git.opendaylight.org/gerrit/#/c/62451/ - it used to be
     bundleContext.getBundle(0).stop();

If you want to push a patch to fix it, I'll merge it.


>
>
>>
>> Regards
>>
>> Muthu
>>
>>
>>
>>
>>
>>
>>
>> *From:* Daniel Farrell [mailto:dfarr...@redhat.com]
>> *Sent:* Friday, October 13, 2017 6:19 AM
>> *To:* Jamo Luhrsen; Muthukumaran K; controller-dev@lists.opendaylight.org;
>> integration-...@lists.opendaylight.org
>> *Subject:* Re: [controller-dev] Best way to gracefully shutdown Karaf in
>> ODL context
>>
>>
>>
>> Hey Muthu,
>>
>>
>>
>> Yes, I think you should take a look at the systemd configuration we ship
>> in ODL's packages. As far as I know it does a good job of
>> starting/stopping/restarting ODL's service.
>>
>>
>>
>> https://git.opendaylight.org/gerrit/gitweb?p=integration/pac
>> kaging.git;a=blob;f=packages/rpm/unitfiles/opendaylight.
>> service;h=ac436592d2880047986b856c7dd6810665ba0d3e;hb=refs/heads/master
>>
>>
>>
>> Here's a Nitrogen RPM that contains that systemd config:
>>
>>
>>
>> http://cbs.centos.org/repos/nfv7-opendaylight-70-release/x86
>> _64/os/Packages/opendaylight-7.0.0-1.el7.noarch.rpm
>>
>>
>>
>> This test job shows examples of `sudo systemctl [start, stop, status]`
>> working:
>>
>>
>>
>> https://jenkins.opendaylight.org/releng/job/packaging-test-rpm-master
>>
>>
>>
>> The logic for that job is here:
>>
>>
>>
>> https://git.opendaylight.org/gerrit/gitweb?p=releng/builder.
>> git;a=blob;f=jjb/packaging/packaging.yaml;h=e4de235ca5435
>> 06063b7fb57c3d257f0b983abe3;hb=refs/heads/master#l346
>>
>>
>>
>> That systemd config is also exercised in tests for puppet-opendaylight,
>> ansible-opendaylight, OPNFV Apex and other OPNFV installers.
>>
>>
>>
>> It seems like you've put some good thought into this, so if you have any
>> suggestions for things we can do better please let us know. :)
>>
>>
>>
>> Daniel
>>
>>
>>
>> On Thu, Oct 12, 2017 at 11:47 AM Jamo Luhrsen <jluhr...@gmail.com> wrote:
>>
>> +Daniel and Integration-dev,
>>
>> Daniel,
>>
>> does our rpm package and the systemd work you did for it answer any of
>> Muthu's
>> questions below? I'm assuming it *IS* the answer, but you will know
>> better.
>>
>> Thanks,
>> JamO
>>
>> On 10/12/2017 04:56 AM, Muthukumaran K wrote:
>> > Hi,
>> >
>> > * *
>> >
>> > *Context* : Figuring out the best possible way to gracefully shutdown
>> Karaf process using standard Karaf commands.
>> >
>> > This would be required because framework-level shutdown-sequence in
>> Karaf would give opportunity framework to properly
>> > execute bundle lifecycle listeners. What I mean is – abrupt kill can
>> potentially prevent lifecycle listeners from being
>> > properly executed and may also impact any inflight transactions which
>> may be in various stages of replication and/or commit
>> > phases. This can in turn lead to troubles during recovery / restart
>> phase.
>> >
>> >
>> >
>> > So, I thought of middle-ground where
>> >
>> > 1)      We execute karaf stop followed by
>> >
>> > 2)      Periodic check  if the last PID indeed terminates
>> >
>> >
>> >
>> > Doing a straight kill -9 could lead to rare heisenbugs during wherein
>> recovery could suffer since there may not be room for
>> > lifecycle listeners to execute (unless Karaf handles it as unified
>> shutdownhook and execute same path as that of stop or any
>> > graceful shutdown methods)
>> >
>> >
>> >
>> > Have anybody tried any better methods without side-effects ?
>> >
>> >
>> >
>> >
>> >
>> > *Option was tried and observation is as follows *
>> >
>> > Using Karaf stop followed by Karaf status command to check if the
>> process has come to a graceful termination. But, it appears
>> > that though ‘status’ command reports Karaf instance as ‘Not Running’,
>> the PID still lingers for 2 to 3 mins roughly in ODL
>> > context. I am biased to think that there are indeed some lifecycle
>> listeners executing … During this ‘PID lingering’ phase,
>> > the thread-dump hints the System Bundle Shutdown is waiting for the BP
>> container to shutdown the components (probably
>> > executing the lifecycle listeners at application and platform levels)
>> >
>> >
>> >
>> > "System Bundle Shutdown" #1582 daemon prio=5 os_prio=0
>> tid=0x00007fb05003d800 nid=0xe68 waiting on condition [0x00007faf77678000]
>> >
>> >    java.lang.Thread.State: TIMED_WAITING (parking)
>> >
>> >                 at sun.misc.Unsafe.park(Native Method)
>> >
>> >                 - parking to wait for  <0x00000000e9064250> (a
>> com.google.common.util.concurrent.AbstractFuture$Sync)
>> >
>> >                 at java.util.concurrent.locks.Loc
>> kSupport.parkNanos(LockSupport.java:215)
>> >
>> >                 at
>> > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcqu
>> ireSharedNanos(AbstractQueuedSynchronizer.java:1037)
>> >
>> >                 at
>> > java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcq
>> uireSharedNanos(AbstractQueuedSynchronizer.java:1328)
>> >
>> >                 at com.google.common.util.concurr
>> ent.AbstractFuture$Sync.get(AbstractFuture.java:268)
>> >
>> >                 at com.google.common.util.concurr
>> ent.AbstractFuture.get(AbstractFuture.java:96)
>> >
>> >                 at org.opendaylight.openflowplugin.openflow.md
>> .core.MDController.stop(MDController.java:358)
>> >
>> >                 at
>> > org.opendaylight.openflowplugin.openflow.md.core.sal.Openflo
>> wPluginProvider.close(OpenflowPluginProvider.java:121)
>> >
>> >                 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> Method)
>> >
>> >                 at sun.reflect.NativeMethodAccess
>> orImpl.invoke(NativeMethodAccessorImpl.java:62)
>> >
>> >                 at sun.reflect.DelegatingMethodAc
>> cessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >
>> >                 at java.lang.reflect.Method.invoke(Method.java:498)
>> >
>> >                 at org.apache.aries.blueprint.uti
>> ls.ReflectionUtils.invoke(ReflectionUtils.java:299)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BeanRecipe.invoke(BeanRecipe.java:980)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BeanRecipe.destroy(BeanRecipe.java:887)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BlueprintRepository.destroy(BlueprintRepository.java:329)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BlueprintContainerImpl.destroyComponents(BlueprintCon
>> tainerImpl.java:765)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BlueprintContainerImpl.tidyupComponents(BlueprintCont
>> ainerImpl.java:964)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BlueprintContainerImpl.destroy(BlueprintContainerImpl.java:909)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BlueprintExtender$3.run(BlueprintExtender.java:325)
>> >
>> >                 at java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:511)
>> >
>> >                 at java.util.concurrent.FutureTas
>> k.run(FutureTask.java:266)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BlueprintExtender.destroyContainer(BlueprintExtender.java:346)
>> >
>> >                 at org.apache.aries.blueprint.con
>> tainer.BlueprintExtender.access$400(BlueprintExtender.java:68)
>> >
>> >                 at
>> > org.apache.aries.blueprint.container.BlueprintExtender$Bluep
>> rintContainerServiceImpl.destroyContainer(BlueprintExtender.java:624)
>> >
>> >                 at
>> > org.opendaylight.controller.blueprint.BlueprintBundleTracker
>> .shutdownAllContainers(BlueprintBundleTracker.java:251)
>> >
>> >                 at org.opendaylight.controller.bl
>> ueprint.BlueprintBundleTracker.bundleChanged(BlueprintBundle
>> Tracker.java:150)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.BundleContextImpl.dispatchEvent(BundleContextImpl.java:847)
>> >
>> >                 at org.eclipse.osgi.framework.eve
>> ntmgr.EventManager.dispatchEvent(EventManager.java:230)
>> >
>> >                 at org.eclipse.osgi.framework.eve
>> ntmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:148)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.Framework.publishBundleEventPrivileged(Framework.java:1568)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.Framework.publishBundleEvent(Framework.java:1504)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.Framework.publishBundleEvent(Framework.java:1499)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.Framework.shutdown(Framework.java:681)
>> >
>> >                 - locked <0x000000008060b4d0> (a
>> org.eclipse.osgi.framework.internal.core.Framework)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.Framework.close(Framework.java:600)
>> >
>> >                 - locked <0x000000008060b4d0> (a
>> org.eclipse.osgi.framework.internal.core.Framework)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.InternalSystemBundle$1.run(InternalSystemBundle.java:261)
>> >
>> >                 at java.lang.Thread.run(Thread.java:745)
>> >
>> >
>> >
>> > "Framework Active Thread" #12 prio=5 os_prio=0 tid=0x00007fb0dc4bd000
>> nid=0x52a waiting for monitor entry [0x00007fb0c14b0000]
>> >
>> >    java.lang.Thread.State: BLOCKED (on object monitor)
>> >
>> >                 at java.lang.Object.wait(Native Method)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.Framework.run(Framework.java:1862)
>> >
>> >                 - locked <0x000000008060b4d0> (a
>> org.eclipse.osgi.framework.internal.core.Framework)
>> >
>> >                 at java.lang.Thread.run(Thread.java:745)
>> >
>> >
>> >
>> > "main" #1 prio=5 os_prio=0 tid=0x00007fb0dc00b800 nid=0x514 in
>> Object.wait() [0x00007fb0e5134000]
>> >
>> >    java.lang.Thread.State: WAITING (on object monitor)
>> >
>> >                 at java.lang.Object.wait(Native Method)
>> >
>> >                 - waiting on <0x000000008060b4d0> (a
>> org.eclipse.osgi.framework.internal.core.Framework)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.Framework.waitForStop(Framework.java:1884)
>> >
>> >                 - locked <0x000000008060b4d0> (a
>> org.eclipse.osgi.framework.internal.core.Framework)
>> >
>> >                 at org.eclipse.osgi.framework.int
>> ernal.core.EquinoxLauncher.waitForStop(EquinoxLauncher.java:118)
>> >
>> >                 at org.eclipse.osgi.launch.Equino
>> x.waitForStop(Equinox.java:182)
>> >
>> >                 at org.apache.karaf.main.Main.awa
>> itShutdown(Main.java:487)
>> >
>> >                 at org.apache.karaf.main.Main.main(Main.java:177)
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > Regards
>> >
>> > Muthu
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > controller-dev mailing list
>> > controller-dev@lists.opendaylight.org
>> > https://lists.opendaylight.org/mailman/listinfo/controller-dev
>> >
>>
>>
>> _______________________________________________
>> controller-dev mailing list
>> controller-dev@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>
>>
>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to