On Mon, Jul 2, 2018 at 2:44 PM Tom Pantelis <tompante...@gmail.com> wrote:

>
>
> On Mon, Jul 2, 2018 at 2:15 PM, Victor Pickard <vpick...@redhat.com>
> wrote:
>
>> Hi all,
>>
>> I'm looking at clustering stability. One of the jobs I've been looking at is 
>> controller clustering. This is a good CSIT, in that it stops and starts ODL 
>> several times during the run.
>>
>> In one of failed test runs (sandbox, logs wiped from last week, but I do 
>> have this particular karaf log archived locally), ODL is started, and rest 
>> calls fail during the test. Looking at the logs, I can see why. Karaf failed 
>> to start, or better yet, took a really long time to start. From the snipped 
>> below, you can see about 7 mins between when Karaf launched, and did 
>> something?, maybe restarted again. But the main thing is that karaf failed 
>> to start in a timely manner, taking over 7 minutes to begin to start up 
>> blueprints, etc.
>>
>>
>> I ran a job that had karaf debug logging enabled with this setting:
>>
>> log4j.rootLogger=DEBUG
>>
>>
>> This did not go very well. This generates way too much debug info, and was 
>> causing timeouts and other various errors in the CSIT run.
>>
>>
>> So, my questions are:
>>
>> 1. Has anyone see this issue where karaf seems to hang on startup (after a 
>> kill -9 on karaf pid)? If so, is this a known issue?
>>
>> 2. What debug would be needed to figure out why karaf was hanging? Note the 
>> above generated a log file of ~768 MB in a very short timespan.
>>
>>
>> Vic - does this happen if you gracefully shut it down?
>

Hi Tom,
I haven't tried that. I'm just running the controller csit, which does a
kill -9 on karaf pid.


> In years past with karaf I recall corruption could occur in the bundle
> cache under data if the karaf process was killed. I don't know if that
> potential issue is still present with karaf 4. Does it clean the data dir
> before restarting? If not, it would be good to do so to be safe.
>

Here are the steps in from the controller csit job for restarting ODL
(Restart Odl With Tell Based False). Looking at this, yes, the data dir is
deleted.

1. kill -9 on karaf pid ( 'ps axf | grep org.apache.karaf | grep -v grep |
awk '{print "kill -9 " $1}' | sh' )
2. Verify karaf is not running
3. Set Tell Based to False in config file
4. Copy karaf logs to /tmp
5. Clean the following directories

   1. rm -rf /tmp/karaf-0.8.3-SNAPSHOT/tmp/
   2. rm -rf /tmp/karaf-0.8.3-SNAPSHOT/data/
   3. rm -rf /tmp/karaf-0.8.3-SNAPSHOT/cache/
   4. rm -rf /tmp/karaf-0.8.3-SNAPSHOT/snapshots/
   5. rm -rf /tmp/karaf-0.8.3-SNAPSHOT/journal/
   6. rm -rf /tmp/karaf-0.8.3-SNAPSHOT/etc/opendaylight/current/
   7. rm -rf /tmp/karaf-0.8.3-SNAPSHOT/etc/host.key

6. Copy logs back to new snapshot dir, as below:

   1. mkdir -p '/tmp/karaf-0.8.3-SNAPSHOT/data' && rm -vrf
   '/tmp/karaf-0.8.3-SNAPSHOT/log' && mv -vf '/tmp/log'
   '/tmp/karaf-0.8.3-SNAPSHOT/data/


> Other than that, we probably need to get a thread dump.
>
>> Thanks,
>>
>> Vic
>>
>>
>>
>>
>> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
>> INFO: Installing and starting initial bundles
>> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.Main launch
>> INFO: All initial bundles installed and set to start
>> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
>> Jun 29, 2018 3:43:47 PM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Lock acquiredJun 29, 2018 3:43:47 PM 
>> org.apache.karaf.main.Main$KarafLockCallback lockAquired
>> INFO: Lock acquired. Setting startlevel to 100
>> Jun 29, 2018 3:50:48 PM org.apache.karaf.main.Main launch
>> INFO: Installing and starting initial bundles
>> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main launch
>> INFO: All initial bundles installed and set to start
>> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Trying to lock /tmp/karaf-0.8.3-SNAPSHOT/lock
>> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.lock.SimpleFileLock lock
>> INFO: Lock acquired
>> Jun 29, 2018 3:50:49 PM org.apache.karaf.main.Main$KarafLockCallback 
>> lockAquired
>> INFO: Lock acquired. Setting startlevel to 100
>>
>>
>>
>> _______________________________________________
>> controller-dev mailing list
>> controller-dev@lists.opendaylight.org
>> https://lists.opendaylight.org/mailman/listinfo/controller-dev
>>
>>
>
_______________________________________________
controller-dev mailing list
controller-dev@lists.opendaylight.org
https://lists.opendaylight.org/mailman/listinfo/controller-dev

Reply via email to