On 07/02/2018 11:44 AM, Tom Pantelis wrote:
On Mon, Jul 2, 2018 at 2:15 PM, Victor Pickard <vpick...@redhat.com
<mailto:vpick...@redhat.com>> wrote:
Hi all,
I'm looking at clustering stability. One of the jobs I've been looking at
is controller clustering. This is a good
CSIT, in that it stops and starts ODL several times during the run.
In one of failed test runs (sandbox, logs wiped from last week, but I do
have this particular karaf log archived
locally), ODL is started, and rest calls fail during the test. Looking at
the logs, I can see why. Karaf failed to
start, or better yet, took a really long time to start. From the snipped
below, you can see about 7 mins between
when Karaf launched, and did something?, maybe restarted again. But the
main thing is that karaf failed to start in
a timely manner, taking over 7 minutes to begin to start up blueprints, etc.
Vic,
when you have a sandbox job that you want to keep around (the logs), write
"copy-logs: <job-name>/<job-number>" on any gerrit. that will trigger a job
to copy the logs to the logs server where we can keep them for 6 months.
Also, is this the failure we have when the high level robot failure is that
some node did not move to cluster syncstatus == true within 5 minutes? Not
coming up for 7 minutes would easily explain that.
Do we have a jira for this one yet?
Thanks,
JamO
I ran a job that had karaf debug logging enabled with this setting:
log4j.rootLogger=DEBUG
This did not go very well. This generates way too much debug info, and was
causing timeouts and other various errors
in the CSIT run.
So, my questions are:
1. Has anyone see this issue where karaf seems to hang on startup (after a
kill -9 on karaf pid)? If so, is this a
known issue?
2. What debug would be needed to figure out why karaf was hanging? Note the
above generated a log file of ~768 MB in
a very short timespan.
Vic - does this happen if you gracefully shut it down? In years past with karaf I recall corruption could occur in the
bundle cache under data if the karaf process was killed. I don't know if that potential issue is still present with
karaf 4. Does it clean the data dir before restarting? If not, it would be good to do so to be safe.
Other than that, we probably need to get a thread dump.
Thanks,
Vic
Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
INFO: Installing and starting initial bundles
Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
INFO: All initial bundles installed and set to start
Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Lock acquired
Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main$KarafLockCallback
lockAquired INFO: Lock acquired. Setting
startlevel to 100 Jun 29, 2018 3:50:48 PM org.apache.karaf.main.Main launch
INFO: Installing and starting initial
bundles
Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main launch
INFO: All initial bundles installed and set to start
Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Lock acquired
Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main$KarafLockCallback
lockAquired
INFO: Lock acquired. Setting startlevel to 100
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
<mailto:controller-dev@lists.opendaylight.org>
https://lists.opendaylight.org/mailman/listinfo/controller-dev
<https://lists.opendaylight.org/mailman/listinfo/controller-dev>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev