On Fri, Feb 16, 2018 at 11:48 AM, Tom Pantelis <tompante...@gmail.com> wrote:
> > > On Fri, Feb 16, 2018 at 2:44 PM, Ajay Lele <ajaysl...@gmail.com> wrote: > >> >> >> On Fri, Feb 16, 2018 at 11:38 AM, Tom Pantelis <tompante...@gmail.com> >> wrote: >> >>> >>> >>> On Fri, Feb 16, 2018 at 2:35 PM, Jamo Luhrsen <jluhr...@gmail.com> >>> wrote: >>> >>>> >>>> >>>> On 2/16/18 11:33 AM, Tom Pantelis wrote: >>>> > >>>> > >>>> > On Fri, Feb 16, 2018 at 2:26 PM, Jamo Luhrsen <jluhr...@gmail.com >>>> <mailto:jluhr...@gmail.com>> wrote: >>>> > >>>> > I'm analyzing CSIT failures for our Carbon SR3 candidate. >>>> > >>>> > Something nasty went wrong in a netvirt CSIT job in the middle of >>>> > the robot tests. Seems like all functionality is probably broken >>>> > after that. >>>> > >>>> > in the karaf.log [0] I see a message about some akka circuit >>>> breaker >>>> > Timed out, then a bunch of RuntimeExceptions: Transaction >>>> > aborted due to shutdown. >>>> > >>>> > >>>> > yeah that means akka persistence failed, ie it timed out waiting for >>>> data to be written to the disk. That kills the >>>> > shard actor with no recovery. This can happen if there's slow disk >>>> access/contention in the env - seen this happen >>>> > before with internal CSIT env before the disk issue was resolved. >>>> >>>> Thanks. I'll report to the infra guys that we are still likely seeing >>>> some high disk IO latency. There was another job with similar issues. >>>> >>> >>> The timeout(s) can be increased in the akka.conf (would have to look it >>> up) if it's really problematic although that's really just a band-aid. >>> >> >> Tom - can some logic be put in place to recover from this failure (e.g. >> eventually restarting the stopped shard). Else the controller has to be >> restarted to get it out of this state. We had seen this in live envs also >> with high load and reported it [0] >> >> [0] https://jira.opendaylight.org/browse/CONTROLLER-1789 >> > > > Not sure - maybe. > Thx Tom. I inadvertently unicasted - adding the DLs back > > >> >> >>> >>> >>>> >>>> JamO >>>> >>>> > Any ideas what's happening here? >>>> > >>>> > Thanks, >>>> > JamO >>>> > >>>> > [0]https://logs.opendaylight.org/releng/vex-yul-odl-jenkins >>>> -1/netvirt-csit-1node-openstack-pike-upstream-stateful-snat- >>>> conntrack-carbon/200/odl_1/odl1_karaf.log.gz >>>> > <https://logs.opendaylight.org/releng/vex-yul-odl-jenkins-1 >>>> /netvirt-csit-1node-openstack-pike-upstream-stateful-snat-co >>>> nntrack-carbon/200/odl_1/odl1_karaf.log.gz> >>>> > _______________________________________________ >>>> > controller-dev mailing list >>>> > controller-dev@lists.opendaylight.org <mailto: >>>> controller-dev@lists.opendaylight.org> >>>> > https://lists.opendaylight.org/mailman/listinfo/controller-dev >>>> > <https://lists.opendaylight.org/mailman/listinfo/controller-dev> >>>> > >>>> > >>>> >>> >>> >>> _______________________________________________ >>> integration-dev mailing list >>> integration-...@lists.opendaylight.org >>> https://lists.opendaylight.org/mailman/listinfo/integration-dev >>> >>> >> >
_______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev