[jira] [Reopened] (KARAF-1309) Cellar causes Karaf container to freeze if system got network interface changes between container restarts

Alexey Bespaly (JIRA) Fri, 11 May 2012 04:16:19 -0700

     [ 
https://issues.apache.org/jira/browse/KARAF-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexey Bespaly reopened KARAF-1309:
-----------------------------------


TESB-EE-Runtime 5.1.1 SNAPSHOT #365
Apache Cellar 2.2.4 SNAPSHOT as of 8.05.2012

While retesting got the following situation with the 4-th node (after adding a 
network interface and starting all containers):

[[ 198] [Active     ] [Created     ] [       ] [   80] Apache Karaf :: Cellar 
:: Core (2.2.4.SNAPSHOT)
[ 199] [Resolved   ] [            ] [       ] [   80] Apache Karaf :: Cellar :: 
Hazelcast (2.2.4.SNAPSHOT)
                                       Hosts: 197
[ 200] [Active     ] [Failure     ] [       ] [   80] Apache Karaf :: Cellar :: 
Config (2.2.4.SNAPSHOT)
[ 201] [Active     ] [Failure     ] [       ] [   80] Apache Karaf :: Cellar :: 
Features (2.2.4.SNAPSHOT)
[ 202] [Active     ] [GracePeriod ] [       ] [   80] Apache Karaf :: Cellar :: 
Bundle (2.2.4.SNAPSHOT)
[ 203] [Active     ] [Failure     ] [       ] [   80] Apache Karaf :: Cellar :: 
DOSGi (2.2.4.SNAPSHOT)
[ 204] [Active     ] [Created     ] [       ] [   80] Apache Karaf :: Cellar :: 
Utils (2.2.4.SNAPSHOT)
[ 205] [Resolved   ] [            ] [       ] [   80] Apache Karaf :: Cellar :: 
Shell (2.2.4.SNAPSHOT)
[ 206] [Resolved   ] [            ] [       ] [   80] Apache Karaf :: Cellar :: 
Management (2.2.4.SNAPSHOT)

tesb.log excerpt:
...
12:44:10,724 | WARN  | ol-10-thread-130 | lar.core.event.EventDispatchTask   88 
| 198 - org.apache.karaf.cellar.core - 2.2.4.SNAPSHOT | Failed to retrieve 
handler for event class org.apache.karaf.cellar.features.RemoteFeaturesEvent
12:44:10,732 | WARN  | ol-10-thread-131 | lar.core.event.EventDispatchTask   88 
| 198 - org.apache.karaf.cellar.core - 2.2.4.SNAPSHOT | Failed to retrieve 
handler for event class org.apache.karaf.cellar.features.RemoteFeaturesEvent
12:44:10,752 | WARN  | ol-10-thread-132 | lar.core.event.EventDispatchTask   88 
| 198 - org.apache.karaf.cellar.core - 2.2.4.SNAPSHOT | Failed to retrieve 
handler for event class org.apache.karaf.cellar.features.RemoteFeaturesEvent
12:44:10,758 | WARN  | ol-10-thread-133 | lar.core.event.EventDispatchTask   88 
| 198 - org.apache.karaf.cellar.core - 2.2.4.SNAPSHOT | Failed to retrieve 
handler for event class org.apache.karaf.cellar.features.RemoteFeaturesEvent
...

and finally:

12:48:41,257 | ERROR | rint Extender: 3 | ntainer.BlueprintContainerImpl$1  293 
| 10 - org.apache.aries.blueprint - 0.3.1 | Unable to start blueprint container 
for bundle org.apache.karaf.cellar.config due to unresolved dependencies 
[(objectClass=org.apache.karaf.cellar.core.GroupManager)]
java.util.concurrent.TimeoutException
        at 
org.apache.aries.blueprint.container.BlueprintContainerImpl$1.run(BlueprintContainerImpl.java:287)[10:org.apache.aries.blueprint:0.3.1]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)[:1.6.0_30]
        at 
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)[:1.6.0_30]
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)[:1.6.0_30]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)[:1.6.0_30]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)[:1.6.0_30]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)[:1.6.0_30]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)[:1.6.0_30]
        at java.lang.Thread.run(Thread.java:662)[:1.6.0_30]

- for all failed bundles.

Afterall, the container failed to go away cleanly (hung) with the following 
exceptions duplicated through the log:

13:39:55,322 | WARN  | cached.thread-73 | dardLoggerFactory$StandardLogger   51 
|  -  -  | 172.27.210.7/172.27.210.7:5704 [cellar] You probably have too long 
Hazelcast configuration!
java.io.IOException: Invalid argument
        at java.net.PlainDatagramSocketImpl.send(Native Method)[:1.6.0_30]
        at java.net.DatagramSocket.send(DatagramSocket.java:625)[:1.6.0_30]
        at 
com.hazelcast.impl.MulticastService.send(MulticastService.java:148)[197:hazelcast:1.9.4.8]
        at 
com.hazelcast.impl.MulticastJoiner.searchForOtherClusters(MulticastJoiner.java:95)[197:hazelcast:1.9.4.8]
        at 
com.hazelcast.impl.SplitBrainHandler.searchForOtherClusters(SplitBrainHandler.java:58)[197:hazelcast:1.9.4.8]
        at 
com.hazelcast.impl.SplitBrainHandler.access$000(SplitBrainHandler.java:22)[197:hazelcast:1.9.4.8]
        at 
com.hazelcast.impl.SplitBrainHandler$1.doRun(SplitBrainHandler.java:46)[197:hazelcast:1.9.4.8]
        at 
com.hazelcast.impl.FallThroughRunnable.run(FallThroughRunnable.java:23)[197:hazelcast:1.9.4.8]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)[:1.6.0_30]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)[:1.6.0_30]
        at java.lang.Thread.run(Thread.java:662)[:1.6.0_30]

                
> Cellar causes Karaf container to freeze if system got network interface 
> changes between container restarts
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: KARAF-1309
>                 URL: https://issues.apache.org/jira/browse/KARAF-1309
>             Project: Karaf
>          Issue Type: Bug
>          Components: cellar-hazelcast
>    Affects Versions: cellar-2.2.4
>         Environment: Karaf-2.2.6-SNAPSHOT from 20120403, 
> Cellar-2.2.4-SNAPSHOT from 20120312
>            Reporter: Alexey Bespaly
>            Assignee: Jean-Baptiste Onofré
>             Fix For: cellar-3.0.0, cellar-2.2.4
>
>         Attachments: logs.tgz
>
>
> 1. Started 4 Karaf instances and installed Cellar on those 
> cluster:node-list
>    No. Host Name             Port ID
> *    1 opti.local            5701 opti.local:5701
>      2 opti.local            5702 opti.local:5702
>      3 opti.local            5703 opti.local:5703
>      4 opti.local            5704 opti.local:5704
> opti.local = 192.168.1.86
> 2. 1,2 - default group; 3,4 - "group1" (not sure if groups are essential 
> here, but were a part of the test case)
> cluster:group-list
>   Node                 Group
>   opti.local:5701      default
>   opti.local:5702      default
>   opti.local:5703      group1
> * opti.local:5704      group1
> 3. Stopped Karaf containers
> 4. Got the VPN client on - tun0 (172.27.210.11) network interface added
> 5. Restarted Karaf containers
>  - the 1st and the 2nd ones seemed to be working fine:
> cluster:node-list
>    No. Host Name             Port ID
> *    1 172.27.210.11         5701 172.27.210.11:5701
>      2 172.27.210.11         5703 172.27.210.11:5703
>      3 172.27.210.11         5702 172.27.210.11:5702
>      4 172.27.210.11         5704 172.27.210.11:5704
>  - the 3rd container got inresponsive:
> karaf@trun> osgi:list
> ...no response....
>  - the 4th shows the following static picture:
> ...skipped...
> [ 211] [Active     ] [Creating    ] [       ] [   60] hazelcast (1.9.4.6)
>                                        Fragments: 213
> [ 212] [Active     ] [Created     ] [       ] [   60] Apache Karaf :: Cellar 
> :: Core (2.2.4.SNAPSHOT)
> [ 213] [Resolved   ] [            ] [       ] [   60] Apache Karaf :: Cellar 
> :: Hazelcast (2.2.4.SNAPSHOT)
>                                        Hosts: 211
> [ 214] [Active     ] [GracePeriod ] [       ] [   60] Apache Karaf :: Cellar 
> :: Config (2.2.4.SNAPSHOT)
> [ 215] [Active     ] [GracePeriod ] [       ] [   60] Apache Karaf :: Cellar 
> :: Features (2.2.4.SNAPSHOT)
> [ 216] [Active     ] [GracePeriod ] [       ] [   60] Apache Karaf :: Cellar 
> :: Bundle (2.2.4.SNAPSHOT)
> [ 217] [Active     ] [Created     ] [       ] [   60] Apache Karaf :: Cellar 
> :: DOSGi (2.2.4.SNAPSHOT)
> [ 218] [Active     ] [Created     ] [       ] [   60] Apache Karaf :: Cellar 
> :: Utils (2.2.4.SNAPSHOT)
> [ 219] [Active     ] [Created     ] [       ] [   60] Apache Karaf :: Cellar 
> :: Shell (2.2.4.SNAPSHOT)
> [ 220] [Active     ] [Created     ] [       ] [   60] Apache Karaf :: Cellar 
> :: Management (2.2.4.SNAPSHOT)
> karaf@trun> 
>  ...and the status does not get changed over the time
> Could it be that the stored config conflicts with the newly detected one and 
> brings such an instability in?
> Logs of 4 Karaf instances are attached. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (KARAF-1309) Cellar causes Karaf container to freeze if system got network interface changes between container restarts

Reply via email to