[ 
https://issues.apache.org/activemq/browse/AMQ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=60448#action_60448
 ] 

Eric commented on AMQ-2774:
---------------------------

I'm sure that I identified a Latch problem in Multicast Network Discovery 
mechanism on Duplex connection 

The multicast notifier thread is blocked. here the trace (on Fuse-5.3.0.5)

"Notifier-MulticastDiscoveryAgent-listener:DiscoveryNetworkConnector:NOCSupervisorP5-ADMIN-OUT-IN:BrokerService[SIBBusModule-NOCP5-tpnocp08s-bus]"
 daemon prio=10 tid=0x0000000044ff2400 nid=0x1389 waiting on condition 
[0x0000000044c26000..0x0000000044c26b90] 
   java.lang.Thread.State: WAITING (parking) 
        at sun.misc.Unsafe.park(Native Method) 
        - parking to wait for  <0x00002aaab3dd66f0> (a 
java.util.concurrent.CountDownLatch$Sync) 
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) 
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
 
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:905)
 
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1217)
 
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:207) 
        at 
org.apache.activemq.network.DemandForwardingBridgeSupport.start(DemandForwardingBridgeSupport.java:231)
 
        at 
org.apache.activemq.network.DiscoveryNetworkConnector.onServiceAdd(DiscoveryNetworkConnector.java:114)
 
        at 
org.apache.activemq.transport.discovery.multicast.MulticastDiscoveryAgent$2.run(MulticastDiscoveryAgent.java:484)
 
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
        at java.lang.Thread.run(Thread.java:619) 

The problem appears when the network is quickly and alternatively on/off 
between the two components. 
The bridge is created in one direction, but the answer can not be received. 

The thread is blocked on the CountDownLatch. Even if multicast frames are 
received, the component can not establish a new network connection. 

Here are an corresponding activemq trace 

When it is OK : 
2010-06-22 22:56:24,500 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      
- Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to 
tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false 
2010-06-22 22:56:26,083 [nocp08s-bus#160] INFO  DemandForwardingBridge         
- Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#160 and 
tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
 has been established. 

2010-06-22 22:57:34,807 [-tpnocp08s-bus]] INFO  DemandForwardingBridge         
- SIBBusModule-NOCP5-tpnocp08s-bus bridge to SIBBusSupervisor-tpnocp11v-bus 
stopped 

2010-06-22 22:57:34,811 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      
- Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to 
tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false 
2010-06-22 22:57:39,064 [nocp08s-bus#162] INFO  DemandForwardingBridge         
- Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#162 and 
tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101(SIBBusSupervisor-tpnocp11v-bus)
 has been established. 

2010-06-22 22:58:42,578 [-tpnocp08s-bus]] INFO  DemandForwardingBridge         
- SIBBusModule-NOCP5-tpnocp08s-bus bridge to SIBBusSupervisor-tpnocp11v-bus 
stopped 

When it is KO : "Unknown" 

2010-06-22 22:58:42,648 [-tpnocp08s-bus]] INFO  DiscoveryNetworkConnector      
- Establishing network connection from vm://SIBBusModule-NOCP5-tpnocp08s-bus to 
tcp://tpnocp11v-bus.vdm.priv.amm.noc:14101?useLocalHost=false 
2010-06-22 22:59:18,031 [18.126.30:14101] WARN  DemandForwardingBridge         
- Network connection between vm://SIBBusModule-NOCP5-tpnocp08s-bus#164 and 
tcp://tpnocp11v-bus.vdm.priv.amm.noc/10.18.126.30:14101 shutdown due to a 
remote error: java.net.SocketException: Connection reset 
2010-06-22 22:59:18,033 [NetworkBridge  ] INFO  DemandForwardingBridge         
- SIBBusModule-NOCP5-tpnocp08s-bus bridge to Unknown stopped 


Here is the other side corresponding activemq trace 

activemq-server.log:2010-06-22 22:55:44,295 [26.190.27:40517] INFO  
TransportConnection            - Created Duplex Bridge back to 
SIBBusModule-NOCP5-tpnocp08s-bus 

activemq-server.log:2010-06-22 22:56:24,438 [26.190.27:40517] INFO  
DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to 
SIBBusModule-NOCP5-tpnocp08s-bus stopped 

activemq-server.log:2010-06-22 22:56:26,135 [26.190.27:40518] INFO  
TransportConnection            - Created Duplex Bridge back to 
SIBBusModule-NOCP5-tpnocp08s-bus 
activemq-server.log:2010-06-22 22:56:26,135 [ocp11v-bus#1770] INFO  
DemandForwardingBridge         - Network connection between 
vm://SIBBusSupervisor-tpnocp11v-bus#1770 and 
tcp:///10.26.190.27:40518(SIBBusModule-NOCP5-tpnocp08s-bus) has been 
established. 

activemq-server.log:2010-06-22 22:57:34,818 [26.190.27:40518] INFO  
DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to 
SIBBusModule-NOCP5-tpnocp08s-bus stopped 

activemq-server.log:2010-06-22 22:57:39,153 [26.190.27:40519] INFO  
TransportConnection            - Created Duplex Bridge back to 
SIBBusModule-NOCP5-tpnocp08s-bus 
activemq-server.log:2010-06-22 22:57:39,153 [ocp11v-bus#1806] INFO  
DemandForwardingBridge         - Network connection between 
vm://SIBBusSupervisor-tpnocp11v-bus#1806 and 
tcp:///10.26.190.27:40519(SIBBusModule-NOCP5-tpnocp08s-bus) has been 
established. 

activemq-server.log:2010-06-22 22:58:44,328 [26.190.27:40519] INFO  
DemandForwardingBridge         - SIBBusSupervisor-tpnocp11v-bus bridge to 
SIBBusModule-NOCP5-tpnocp08s-bus stopped 


> Network of brokers : Multicast discovery stopped to work
> --------------------------------------------------------
>
>                 Key: AMQ-2774
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2774
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.2.0
>         Environment: Linux
>            Reporter: Eric
>             Fix For: 5.4.1
>
>         Attachments: JMAC-BEA-lastlog.log-20100315
>
>
> Hi everybody
> I experiment a big problem with the multicast discovery algorithm, in a 
> network of brokers topology.
> In some conditions, a broker can't reestablish a distant connection even if 
> the distant broker is restarted.
> I have the log traces that would help to identify the origin of the problem.
> When there is no discovery/connection error, I can see these 2 lines in the 
> activemq log file
> #08 Jun 2010 14:31:30,639  INFO  [Multicast Discovery Agent Notifier] 
> org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to 
> tcp://tpnocp09v-bus:13100?useLocalHost=false
> #08 Jun 2010 14:31:30,692  INFO  [StartLocalBridge: 
> localBroker=vm://ACCLU-tpnocp04v#26] 
> org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#26 and 
> tcp://tpnocp09v-bus/10.18.126.28:13100(MOM-tpnocp09v) has been established.
> When the connection is broken, I can see this line in the log.
> #11 Jun 2010 12:37:32,585  INFO  [Multicast Discovery Agent Notifier] 
> org.apache.activemq.network.DemandForwardingBridge
> ACCLU-tpnocp04v bridge to MOM-tpnocp09v stopped
> Then the current ACCLU-tpnocp04v broker tries to reestablish the connection :
> #11 Jun 2010 12:37:34,475  INFO  [Multicast Discovery Agent Notifier] 
> org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to 
> tcp://tpnocp09v-bus:13100?useLocalHost=false
> But, here, the second line of the log ("has been established") doesn't appear 
> in the log file !! I don't know exactly if the connection is up or not.
> Then the connection is broken again (look at "Unknown" instead of 
> "MOM-tpnocp09v".
> #11 Jun 2010 13:33:58,655  WARN  [ActiveMQ Transport: 
> tcp://tpnocp09v-bus/10.18.126.28:13100] 
> org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#58 and 
> tcp://tpnocp09v-bus/10.18.126.28:13100 shutdown due to a remote error: 
> java.net.SocketException: Connection reset
> #11 Jun 2010 13:33:58,657  INFO  [NetworkBridge] 
> org.apache.activemq.network.DemandForwardingBridge^M
> ACCLU-tpnocp04v bridge to Unknown stopped
> And, now, even if I restart the distant broker ( MOM-tpnocp09v ), no line 
> (Establishing/Has been established) appears, and no network connection is 
> reestablished between ACCLU-tpnocp04v and MOM-tpnocp09v. it seems that this 
> ACCLU-tpnocp04v broker can no longer establish a connection with the 
> MOM-tpnocp09v broker !!!
> The production teams tell me that this problem seems not to be resolved in 
> fuse-5.3.0.6 version.
> Eric-AWL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to