[Gluster-devel] debugging ping timeouts

Pranith Kumar Karampuri Fri, 21 Mar 2014 02:26:27 -0700

hi,
    I do not think glusterfs at the moment could tell why a ping-timeout 
happened. And by the time a user learns that such an event happened, client 
would have disconnected and reconnected, so we can not debug the issue any 
more. One of the reasons why ping-timeouts may happen is because epoll thread 
is busy doing something, most probably waiting on a mutex lock. So I am 
thinking may be we should add some extra information before and after acquiring 
locks and duration of critical section executions and report them at the time 
of disconnect.


pseudo code:

PTHREAD_MUTEX_LOCK(lock) {
     get the current time to T1;
     pthread_mutex_lock (lock);
     get the current time T2;
     if T2-T2 is greather than already recorded time update it //may be we 
should also remember the xlator in which it happened.
}

PTHREAD_MUTEX_UNLOCK(lock) {
     get the current time to T3;
     pthread_mutex_unlock (lock);
     if T3-T2 is greather than already recorded time update it
}

Something similar should be done for spin_locks as well.

When a disconnect event comes this information will be logged along with 
disconnect messages.

If you could think of anything else please add it to the thread and we will 
make a call after a while to see what all can be done to debug such issues 
further.

Pranith

_______________________________________________
Gluster-devel mailing list
Gluster-devel@nongnu.org
https://lists.nongnu.org/mailman/listinfo/gluster-devel

[Gluster-devel] debugging ping timeouts

Reply via email to