Hi,

Some further thoughts...

Whenever we find a problem related to a lock, it is a good plan to understand where the problem actually lies. In other words whether the locking itself is slow, or whether it is some action that is being performed under the lock that is the issue. We have the ability to easily create histograms of DLM lock times, and almost as easily create histograms of the glock times (gfs2_glock_queue -> gfs2_promote). We can easily filter on glock type (rgrp) and the lock transistions that we care about (any -> EX) too. So it would be interesting to look at this in order to get more of an insight into what is really going on.

Taking the raw histogram and multiplying the count by the centre of each bucket gives us total time taken for each different lock latency. Then it is easy to see which latencies are the ones causing the most delay.

It would also be interesting to know how long it takes to allocate and deallocate a block. What are the operations that take the most time? Unfortunately our block allocation tracepoint doesn't give us that info, but it is probably not that tricky to alter it, so that it does.

That would give us a much more detailed picture of what is going on I think,

Steve.

Reply via email to