The more relevant question would be with TCP_KEEPALIVE and TCP_USER_TIMEOUT on sockets, do we really need ping-pong framework in Clients? We might need that in transport/rdma setups, but my question is concentrating on transport/rdma. In other words would like to hear why do we need heart-beat mechanism in the first place. One scenario might be a healthy socket level connection but an unhealthy brick/client (like a deadlocked one). Are there enough such realistic scenarios which make ping-pong/heartbeat necessary? What other ways brick/client can go bad?
On Thu, Jan 19, 2017 at 3:36 PM, Raghavendra G <[email protected]> wrote: > > > On Thu, Jan 19, 2017 at 1:50 PM, Mohammed Rafi K C <[email protected]> > wrote: > >> Hi, >> >> The patch for priority based ping packets [1] are ready to review. As >> Shyam mentioned in the comment on patch set 12, it doesn't solve the >> problem with network conjunction nor the disk latency. Also it won't >> priorities the reply of ping packets at the server end (We don't have a >> straight way to identify prognum in the reply). >> >> >> So my question , Is it worth of taking the patch or do we need to think >> through a more generic solutions. >> > > Though ping requests can take more time to reach server due to heavy > traffic, realistically speaking common reasons for ping-timer expiry have > been either > > 1. client not been able to read ping response [2] > 2. server not able to read ping request. > > Speaking about 2 above, Me, Kritika and Pranith were just discussing today > morning about an issue where they had hit ping timer expiry in replicated > setups when disk usage was high. The reason for this as Pranith pointed out > was, > 1. posix has some fops (like posix_xattrop, posix_fxattrop) which do > syscalls after holding a lock on inode (inode->lock). > 2. During high disk usage scenarios, syscall latencies were high > (sometimes >= ping-timeout value) > 3. Before being handed over to a new thread at io-threads xlator, a fop > gets executed in one of the threads that reads incoming messages from > socket. This execution path includes some translators like protocol/server, > index, quota-enforcer, marker. And these translators might access inode-ctx > which involves locking inode (inode->lock). Due to this locking latency of > syscall gets transferred to poller thread. Since poller thread is waiting > on inode->lock, it won't be able to read ping requests from network in-time > resulting in ping-timer expiry. > > I think Kritika is working on a patch to eliminate locking on inode in 1 > above. We also need to reduce the actual fop execution in poller thread. > IOW, we need to hand over the fop execution to io-threads/syncop-threads as > early as we can. [3] helps in this scenario as it adds back the socket for > polling immediately after reading the entire msg but before execution of > fop begins. So, even though fop execution is happening in poller thread, > msgs from same connection can be read in other poller threads parallely > (and we can scale up the number of epoll-threads when load is high). > > Also, note that there is no way we can send entire ping request as > "URGENT" data over network. So, prioritization in [1] is only the queue of > messages waiting to be written to network. So, Though I suggested [1], the > more I think of it, it seems less irrelevant. > > [2] http://review.gluster.org/12402 > [3] http://review.gluster.org/15036 > > >> >> Note : We could make this patch more generic so that any packets can be >> marked as priority to add into the head instead of just Ping packets. >> >> [1] : http://review.gluster.org/#/c/11935/ >> >> Regards >> >> Rafi KC >> >> _______________________________________________ >> Gluster-devel mailing list >> [email protected] >> http://lists.gluster.org/mailman/listinfo/gluster-devel >> > > > > -- > Raghavendra G > -- Raghavendra G
_______________________________________________ Gluster-devel mailing list [email protected] http://lists.gluster.org/mailman/listinfo/gluster-devel
