Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always receive a heartbeat message every 2 seconds. Nodes without this patch will send(respond) with O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they received. So nodes with this patch will always receive a response message. So, in a mixed setup, both nodes will always hear the heartbeat from each other :). thanks, --Srini Joel Becker wrote: On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case O2NET_MSG_KEEP_REQ_MAGIC: -o2net_sendpage(sc, o2net_keep_resp, - sizeof(*o2net_keep_resp)); +/* Each node now sends keepalive message every + * keepalive time interval. Hence no need for response + */ goto out; You still have to send the response. Think about a mixed environment where some nodes have this fix and some do not. The older software is still waiting on the response. The newer version can just ignore any responses it gets from other nodes. But it has to send responses out just in case the other node is older. The only other alternative is to bump the o2net protocol version, and that means the cluster has to be shut down to upgrade. Not a good choice. Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
What's the harm in leaving it in? Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always receive a heartbeat message every 2 seconds. Nodes without this patch will send(respond) with O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they received. So nodes with this patch will always receive a response message. So, in a mixed setup, both nodes will always hear the heartbeat from each other :). thanks, --Srini Joel Becker wrote: On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case O2NET_MSG_KEEP_REQ_MAGIC: - o2net_sendpage(sc, o2net_keep_resp, - sizeof(*o2net_keep_resp)); + /* Each node now sends keepalive message every +* keepalive time interval. Hence no need for response +*/ goto out; You still have to send the response. Think about a mixed environment where some nodes have this fix and some do not. The older software is still waiting on the response. The newer version can just ignore any responses it gets from other nodes. But it has to send responses out just in case the other node is older. The only other alternative is to bump the o2net protocol version, and that means the cluster has to be shut down to upgrade. Not a good choice. Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
No harm, just doubles heartbeat messages which is not required at all. Sunil Mushran wrote: What's the harm in leaving it in? Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always receive a heartbeat message every 2 seconds. Nodes without this patch will send(respond) with O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they received. So nodes with this patch will always receive a response message. So, in a mixed setup, both nodes will always hear the heartbeat from each other :). thanks, --Srini Joel Becker wrote: On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case O2NET_MSG_KEEP_REQ_MAGIC: -o2net_sendpage(sc, o2net_keep_resp, - sizeof(*o2net_keep_resp)); +/* Each node now sends keepalive message every + * keepalive time interval. Hence no need for response + */ goto out; You still have to send the response. Think about a mixed environment where some nodes have this fix and some do not. The older software is still waiting on the response. The newer version can just ignore any responses it gets from other nodes. But it has to send responses out just in case the other node is older. The only other alternative is to bump the o2net protocol version, and that means the cluster has to be shut down to upgrade. Not a good choice. Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
How will it double? The node will send a keepalive only if it has not heard from the other node for 2 secs. Srinivas Eeda wrote: No harm, just doubles heartbeat messages which is not required at all. Sunil Mushran wrote: What's the harm in leaving it in? Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always receive a heartbeat message every 2 seconds. Nodes without this patch will send(respond) with O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they received. So nodes with this patch will always receive a response message. So, in a mixed setup, both nodes will always hear the heartbeat from each other :). thanks, --Srini Joel Becker wrote: On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case O2NET_MSG_KEEP_REQ_MAGIC: -o2net_sendpage(sc, o2net_keep_resp, - sizeof(*o2net_keep_resp)); +/* Each node now sends keepalive message every + * keepalive time interval. Hence no need for response + */ goto out; You still have to send the response. Think about a mixed environment where some nodes have this fix and some do not. The older software is still waiting on the response. The newer version can just ignore any responses it gets from other nodes. But it has to send responses out just in case the other node is older. The only other alternative is to bump the o2net protocol version, and that means the cluster has to be shut down to upgrade. Not a good choice. Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
In old code a node cancels and re queues keep alive message when it hears from the other node. If it didn't hear in 2 seconds, queued message gets fired which sends a keep alive message. And a re queue happens only after it hears from the other node. With the new change, a node sends keep alive every 2 seconds. Sunil Mushran wrote: How will it double? The node will send a keepalive only if it has not heard from the other node for 2 secs. Srinivas Eeda wrote: No harm, just doubles heartbeat messages which is not required at all. Sunil Mushran wrote: What's the harm in leaving it in? Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always receive a heartbeat message every 2 seconds. Nodes without this patch will send(respond) with O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they received. So nodes with this patch will always receive a response message. So, in a mixed setup, both nodes will always hear the heartbeat from each other :). thanks, --Srini Joel Becker wrote: On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case O2NET_MSG_KEEP_REQ_MAGIC: -o2net_sendpage(sc, o2net_keep_resp, - sizeof(*o2net_keep_resp)); +/* Each node now sends keepalive message every + * keepalive time interval. Hence no need for response + */ goto out; You still have to send the response. Think about a mixed environment where some nodes have this fix and some do not. The older software is still waiting on the response. The newer version can just ignore any responses it gets from other nodes. But it has to send responses out just in case the other node is older. The only other alternative is to bump the o2net protocol version, and that means the cluster has to be shut down to upgrade. Not a good choice. Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
Ok, I'll modify the patch. Are messages queued on o2net_wq and execution of o2net_process_message is always done in the context of o2net thread and are synchronized? On 2/17/2010 3:50 PM, Sunil Mushran wrote: My understanding was that we'll also requeue after sending a keepalive. As in, not wait for the response to requeue. But we'll still be smart about it in the sense that not send a hb even if the nodes are communicating otherwise. Srinivas Eeda wrote: In old code a node cancels and re queues keep alive message when it hears from the other node. If it didn't hear in 2 seconds, queued message gets fired which sends a keep alive message. And a re queue happens only after it hears from the other node. With the new change, a node sends keep alive every 2 seconds. Sunil Mushran wrote: How will it double? The node will send a keepalive only if it has not heard from the other node for 2 secs. Srinivas Eeda wrote: No harm, just doubles heartbeat messages which is not required at all. Sunil Mushran wrote: What's the harm in leaving it in? Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always receive a heartbeat message every 2 seconds. Nodes without this patch will send(respond) with O2NET_MSG_KEEP_RESP_MAGIC for every keep alive packet they received. So nodes with this patch will always receive a response message. So, in a mixed setup, both nodes will always hear the heartbeat from each other :). thanks, --Srini Joel Becker wrote: On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case O2NET_MSG_KEEP_REQ_MAGIC: -o2net_sendpage(sc, o2net_keep_resp, - sizeof(*o2net_keep_resp)); +/* Each node now sends keepalive message every + * keepalive time interval. Hence no need for response + */ goto out; You still have to send the response. Think about a mixed environment where some nodes have this fix and some do not. The older software is still waiting on the response. The newer version can just ignore any responses it gets from other nodes. But it has to send responses out just in case the other node is older. The only other alternative is to bump the o2net protocol version, and that means the cluster has to be shut down to upgrade. Not a good choice. Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
Yea, they don't expect/wait for a response for keep alive message. On 2/17/2010 5:49 PM, Joel Becker wrote: On Wed, Feb 17, 2010 at 10:24:30AM -0800, Srinivas Eeda wrote: Each node that has this patch would send a O2NET_MSG_KEEP_REQ_MAGIC every 2 seconds(default). So, nodes without this patch would always receive a heartbeat message every 2 seconds. So, old nodes do not care if they never receive RESP_MAGIC as long as they got some other message? Joel ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel
Re: [Ocfs2-devel] [PATCH 3/3] o2net: correct keepalive message protocol
On Thu, Jan 28, 2010 at 08:51:11PM -0800, Srinivas Eeda wrote: case O2NET_MSG_KEEP_REQ_MAGIC: - o2net_sendpage(sc, o2net_keep_resp, -sizeof(*o2net_keep_resp)); + /* Each node now sends keepalive message every + * keepalive time interval. Hence no need for response + */ goto out; You still have to send the response. Think about a mixed environment where some nodes have this fix and some do not. The older software is still waiting on the response. The newer version can just ignore any responses it gets from other nodes. But it has to send responses out just in case the other node is older. The only other alternative is to bump the o2net protocol version, and that means the cluster has to be shut down to upgrade. Not a good choice. Joel -- Life's Little Instruction Book #464 Don't miss the magic of the moment by focusing on what's to come. Joel Becker Principal Software Developer Oracle E-mail: joel.bec...@oracle.com Phone: (650) 506-8127 ___ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-devel