On 10/30/2017 01:29 PM, Tom Pantelis wrote: > > > On Mon, Oct 30, 2017 at 4:25 PM, Sam Hague <sha...@redhat.com > <mailto:sha...@redhat.com>> wrote: > > > > On Mon, Oct 30, 2017 at 3:02 PM, Tom Pantelis <tompante...@gmail.com > <mailto:tompante...@gmail.com>> wrote: > > > > On Mon, Oct 30, 2017 at 2:49 PM, Michael Vorburger > <vorbur...@redhat.com <mailto:vorbur...@redhat.com>> wrote: > > Hi Sam, > > On Mon, Oct 30, 2017 at 7:45 PM, Sam Hague <sha...@redhat.com > <mailto:sha...@redhat.com>> wrote: > > Stephen, Michael, Tom, > > do you have any ways to collect debugs when ODL crashes in > CSIT? > > > JVMs (almost) never "just crash" without a word... either some > code does java.lang.System.exit(), which you may > remember we do in the CDS/Akka code somewhere, or there's a bug > in the JVM implementation - in which case there > should be a one of those JVM crash logs type things - a file > named something like hs_err_pid22607.log in the > "current working" directory. Where would that be on these CSIT > runs, and are the CSIT JJB jobs set up to preserve > such JVM crash log files and copy them over to > logs.opendaylight.org <http://logs.opendaylight.org> ? > > > Akka will do System.exit() if it encounters an error serious for > that. But it doesn't do it silently. However I > believe we disabled the automatic exiting in akka. > > Should there be any logs in ODL for this? There is nothing in the karaf > log when this happens. It literally just stops. > > The karaf.console log does say the karaf process was killed: > > /tmp/karaf-0.7.1-SNAPSHOT/bin/karaf: line 422: 11528 Killed ${KARAF_EXEC} > "${JAVA}" ${JAVA_OPTS} "$NON_BLOCKING_PRNG" > -Djava.endorsed.dirs="${JAVA_ENDORSED_DIRS}" > -Djava.ext.dirs="${JAVA_EXT_DIRS}" > -Dkaraf.instances="${KARAF_HOME}/instances" -Dkaraf.home="${KARAF_HOME}" > -Dkaraf.base="${KARAF_BASE}" > -Dkaraf.data="${KARAF_DATA}" -Dkaraf.etc="${KARAF_ETC}" > -Dkaraf.restart.jvm.supported=true > -Djava.io.tmpdir="${KARAF_DATA}/tmp" > -Djava.util.logging.config.file="${KARAF_BASE}/etc/java.util.logging.properties" > ${KARAF_SYSTEM_OPTS} ${KARAF_OPTS} ${OPTS} "$@" -classpath "${CLASSPATH}" > ${MAIN} > > In the CSIT robot files we can see the below connection errors so ODL is > not responding to new requests. This plus the > above lead to think ODL just died. > > [ WARN ] Retrying (Retry(total=2, connect=None, read=None, redirect=None, > status=None)) after connection broken by > 'NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection > object at 0x5ca2d50>: Failed to establish a new > connection: [Errno 111] Connection refused',)' > > > > That would seem to indicate something did a kill -9. As Michael said, if the > JVM crashed there would be an hs_err_pid file > and it would log a message about it.
yeah, this is where my money is at as well. The OS must be dumping it because it's misbehaving. I'll try to hack the job to start collecting os level log info (e.g. journalctl, etc) JamO > > _______________________________________________ > controller-dev mailing list > controller-dev@lists.opendaylight.org > https://lists.opendaylight.org/mailman/listinfo/controller-dev > _______________________________________________ controller-dev mailing list controller-dev@lists.opendaylight.org https://lists.opendaylight.org/mailman/listinfo/controller-dev