[ 
https://issues.apache.org/activemq/browse/AMQ-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=60911#action_60911
 ] 

Eric commented on AMQ-2774:
---------------------------

Hi Gary

It's very difficult to simulate quick network faults. With my JUNIT test, I 
simulate close() immediately or some seconds later (with a random value). When 
the close() is done immediatly, I succeeded  in  validating DUPLEX network of 
brokers and that nothing was blocked in this situation with my patch :

2010-07-26 14:09:20,001 [ce[SpokeBroker]] INFO  DiscoveryNetworkConnector      
- Establishing network connection from vm://SpokeBroker to 
tcpfaulty://localhost.localdomain:61617
2010-07-26 14:09:20,035 [ocalport=32972]] INFO  SocketTstFactory               
- Trying to close client socket 
Socket[addr=localhost.localdomain/127.0.0.1,port=61617,localport=32972] 
immediatly
2010-07-26 14:09:20,036 [ocalport=32972]] INFO  SocketTstFactory               
- Client socket 
Socket[addr=localhost.localdomain/127.0.0.1,port=61617,localport=32972] is 
closed.
2010-07-26 14:09:20,037 [127.0.0.1:61617] WARN  DemandForwardingBridge         
- Network connection between vm://SpokeBroker#8 and 
tcpfaulty://localhost.localdomain/127.0.0.1:61617 shutdown due to a remote 
error: java.net.SocketException: Socket closed
2010-07-26 14:09:20,038 [NetworkBridge  ] INFO  DemandForwardingBridge         
- SpokeBroker bridge to Unknown stopped

In this kind of situation (bridge to Unknown stopped), I experimented on 
5.3.0-05 fuse production environment, that the network of connector thread was 
completely blocked on the latch, with Duplex connections.

I'm not sure that my JUNIT test demonstrates the problem on 5.3.0-05. It helped 
me to debug my own patch.

I don't try my JUNIT test on 5.3.0-5 fuse version. I'm going to verify that my 
JUNIT test sometimes shows the problem with the 5.3.0-5 core jar.

I can look at 5.4-snapshot source code to see if something is already changed 
about this latch on the trunk.

I will tell you my results.

Eric-AWL


> Network of brokers : Multicast discovery stopped to work
> --------------------------------------------------------
>
>                 Key: AMQ-2774
>                 URL: https://issues.apache.org/activemq/browse/AMQ-2774
>             Project: ActiveMQ
>          Issue Type: Bug
>    Affects Versions: 5.2.0
>         Environment: Linux
>            Reporter: Eric
>            Assignee: Gary Tully
>             Fix For: 5.4.1
>
>         Attachments: AMQ2774.tar, JMAC-BEA-lastlog.log-20100315
>
>
> Hi everybody
> I experiment a big problem with the multicast discovery algorithm, in a 
> network of brokers topology.
> In some conditions, a broker can't reestablish a distant connection even if 
> the distant broker is restarted.
> I have the log traces that would help to identify the origin of the problem.
> When there is no discovery/connection error, I can see these 2 lines in the 
> activemq log file
> #08 Jun 2010 14:31:30,639  INFO  [Multicast Discovery Agent Notifier] 
> org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to 
> tcp://tpnocp09v-bus:13100?useLocalHost=false
> #08 Jun 2010 14:31:30,692  INFO  [StartLocalBridge: 
> localBroker=vm://ACCLU-tpnocp04v#26] 
> org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#26 and 
> tcp://tpnocp09v-bus/10.18.126.28:13100(MOM-tpnocp09v) has been established.
> When the connection is broken, I can see this line in the log.
> #11 Jun 2010 12:37:32,585  INFO  [Multicast Discovery Agent Notifier] 
> org.apache.activemq.network.DemandForwardingBridge
> ACCLU-tpnocp04v bridge to MOM-tpnocp09v stopped
> Then the current ACCLU-tpnocp04v broker tries to reestablish the connection :
> #11 Jun 2010 12:37:34,475  INFO  [Multicast Discovery Agent Notifier] 
> org.apache.activemq.network.DiscoveryNetworkConnector
> Establishing network connection between from vm://ACCLU-tpnocp04v to 
> tcp://tpnocp09v-bus:13100?useLocalHost=false
> But, here, the second line of the log ("has been established") doesn't appear 
> in the log file !! I don't know exactly if the connection is up or not.
> Then the connection is broken again (look at "Unknown" instead of 
> "MOM-tpnocp09v".
> #11 Jun 2010 13:33:58,655  WARN  [ActiveMQ Transport: 
> tcp://tpnocp09v-bus/10.18.126.28:13100] 
> org.apache.activemq.network.DemandForwardingBridge
> Network connection between vm://ACCLU-tpnocp04v#58 and 
> tcp://tpnocp09v-bus/10.18.126.28:13100 shutdown due to a remote error: 
> java.net.SocketException: Connection reset
> #11 Jun 2010 13:33:58,657  INFO  [NetworkBridge] 
> org.apache.activemq.network.DemandForwardingBridge^M
> ACCLU-tpnocp04v bridge to Unknown stopped
> And, now, even if I restart the distant broker ( MOM-tpnocp09v ), no line 
> (Establishing/Has been established) appears, and no network connection is 
> reestablished between ACCLU-tpnocp04v and MOM-tpnocp09v. it seems that this 
> ACCLU-tpnocp04v broker can no longer establish a connection with the 
> MOM-tpnocp09v broker !!!
> The production teams tell me that this problem seems not to be resolved in 
> fuse-5.3.0.6 version.
> Eric-AWL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to