hi, I do not think glusterfs at the moment could tell why a ping-timeout happened. And by the time a user learns that such an event happened, client would have disconnected and reconnected, so we can not debug the issue any more. One of the reasons why ping-timeouts may happen is because epoll thread is busy doing something, most probably waiting on a mutex lock. So I am thinking may be we should add some extra information before and after acquiring locks and duration of critical section executions and report them at the time of disconnect.
pseudo code: PTHREAD_MUTEX_LOCK(lock) { get the current time to T1; pthread_mutex_lock (lock); get the current time T2; if T2-T2 is greather than already recorded time update it //may be we should also remember the xlator in which it happened. } PTHREAD_MUTEX_UNLOCK(lock) { get the current time to T3; pthread_mutex_unlock (lock); if T3-T2 is greather than already recorded time update it } Something similar should be done for spin_locks as well. When a disconnect event comes this information will be logged along with disconnect messages. If you could think of anything else please add it to the thread and we will make a call after a while to see what all can be done to debug such issues further. Pranith _______________________________________________ Gluster-devel mailing list Gluster-devel@nongnu.org https://lists.nongnu.org/mailman/listinfo/gluster-devel