I was able to recreate the problem on our cluster. We use the DistributedLockManager of the JGroups vers. 2.2.8. Just put it under stress testing I was able to recreate the problem consistently. The root of the problem is the JGroup fails to receive heartbeat messages from the member under stress testing and it remove the channel from the group. The channel on the stress node is then closed and unusable for sending or receiving messages. To prevent the problem from happening I use following parameters and It works out OK
FD ( <FD timeout="5000" max_tries="4" shun=true/> ) ................. <pbcast.GMS ....... shun="false"/> Since we have access to the channel creation code, we also set the channel option AUTO_RECONNECT after create it. I don't know if this is the case for your situation channel.setOpt(Channel.AUTO_RECONNECT, Boolean.TRUE) to allow channel to restart and function again if the problem does happen. Hope It provides some help. HN View the original post : http://www.jboss.com/index.html?module=bb&op=viewtopic&p=3902667#3902667 Reply to the post : http://www.jboss.com/index.html?module=bb&op=posting&mode=reply&p=3902667 ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ JBoss-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/jboss-user
