My understanding was that we'll also requeue after sending a keepalive. As in, not wait for the response to requeue. But we'll still be smart about it in the sense that not send a hb even if the nodes are communicating otherwise.
Srinivas Eeda wrote: > In old code a node cancels and re queues keep alive message when it > hears from the other node. If it didn't hear in 2 seconds, queued > message gets fired which sends a keep alive message. And a re queue > happens only after it hears from the other node. > > With the new change, a node sends keep alive every 2 seconds. > > Sunil Mushran wrote: >> How will it double? The node will send a keepalive only if it has >> not heard from the other node for 2 secs. >> >> Srinivas Eeda wrote: >>> No harm, just doubles heartbeat messages which is not required at all. >>> >>> Sunil Mushran wrote: >>>> What's the harm in leaving it in? >>>> >>>> Srinivas Eeda wrote: >>>>> Each node that has this patch would send a >>>>> O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes >>>>> without this patch would always receive a heartbeat message every >>>>> 2 seconds. >>>>> >>>>> Nodes without this patch will send(respond) with >>>>> O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they >>>>> received. So nodes with this patch will always receive a response >>>>> message. >>>>> >>>>> So, in a mixed setup, both nodes will always hear the heartbeat >>>>> from each other :). >>>>> >>>>> thanks, >>>>> --Srini >>>>> >>>>> >>>>> >>>>> Joel Becker wrote: >>>>> >>>>>> On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: >>>>>> >>>>>>> case O2NET_MSG_KEEP_REQ_MAGIC: >>>>>>> - o2net_sendpage(sc, o2net_keep_resp, >>>>>>> - sizeof(*o2net_keep_resp)); >>>>>>> + /* Each node now sends keepalive message every >>>>>>> + * keepalive time interval. Hence no need for response >>>>>>> + */ >>>>>>> goto out; >>>>>>> >>>>>> You still have to send the response. Think about a mixed >>>>>> environment where some nodes have this fix and some do not. The >>>>>> older >>>>>> software is still waiting on the response. >>>>>> The newer version can just ignore any responses it gets from >>>>>> other nodes. But it has to send responses out just in case the >>>>>> other >>>>>> node is older. >>>>>> The only other alternative is to bump the o2net protocol >>>>>> version, and that means the cluster has to be shut down to >>>>>> upgrade. Not >>>>>> a good choice. >>>>>> >>>>>> Joel >>>>>> >>>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Ocfs2-devel mailing list >>>>> Ocfs2-devel@oss.oracle.com >>>>> http://oss.oracle.com/mailman/listinfo/ocfs2-devel >>>>> >>>> >>> >> > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel