Losing heartbeats and removing from reliability protocol Hi Ray,
In our native-networking’s reliability-protocol, we expect acknowledges from participating nodes on receiving packets of reliable information. When a node ‘voluntarily’ stops participating in that protocol everything is fine (as he’ll let the other nodes know), yet if a node become unresponsive (i.e. does NOT acknowledge its data ‘in-time’) its ‘non-participating’ status will be determined by means of a heartbeat mechanism. The configurable topology-discovery mechanism of OpenSplice’s native-networking protocol allows for timely/swift discovery of remote-node presence/absense , whereas the configurable ‘reactivity-control’ mechanisms (related to retransmission-timing for reach individual priority-band in OpenSplice) allows for detection of prolongued reactivity-issues (like low-priority reliable network channels not getting enough processing-resources to timely acknowledge the data they receive). So your observed issue could be either a topology/connectivity or channel/reactivity issue and the solution could be as easy as ‘relaxing’ the discovery timing or as complex as having structural switch-issues and/or reconsidering the configured flow-control for each utilized priority-band to assure that there’s always enough processing and timely processing-resources on a node available to handle your reliable priority-bands. Not sure what you mean with ‘issues with switcher’ .. The “reconnect” option typically applies to recovering from reactivity issues rather than communication/topology issues so I can’t say based upon the provide information what the problem and/or the solution is. We would be happy to provide some consultancy on our specific issues of course .. Thanks, Hans * * *Hans van 't Hag* OpenSplice DDS Product Manager PrismTech Netherlands Email: hans.vant...@prismtech.com Tel: +31742472572 Fax: +31742472571 Gsm: +31624654078 PrismTech is a global leader in standards-based, performance-critical middleware. Our products enable our OEM, Systems Integrator, and End User customers to build and optimize high-performance systems primarily for Mil/Aero, Communications, Industrial, and Financial Markets. ------------------------------ *From:* developer-boun...@opensplice.org [mailto: developer-boun...@opensplice.org] *On Behalf Of *Francis, Raymond *Sent:* Friday, July 01, 2011 2:55 PM *To:* 'developer@opensplice.org' *Subject:* [OSPL-Dev] Losing heartbeats and removing from reliability protocol Dear Community, I was wondering if anyone could explain what causes heartbeats to be lost from certain nodes, and subsequent removal from the reliable protocol? What does "removing node from reliable protocol" really mean? From what I see, it seems to prevent any more data being received by the removed node from removing node. We have other applications using the same network, and they all work fine. I have a suspicion that it could be issues with switcher as dds within a vlan seems to hold, but crossing to a vlan in another switcher fails. I have also added the Reconnection tag to the General tag for the networking service. But it seems that once the reliable protocol fails, and a node is seemingly lost there is no way of recovering it. Raymond Francis
<<image002.jpg>>
_______________________________________________ OpenSplice DDS Developer Mailing List Developer@opensplice.org Subscribe / Unsubscribe http://dev.opensplice.org/mailman/listinfo/developer