----- Original Message ----- > > > On 08/10/18 13:59, Mark Syms wrote: > > That sounds entirely reasonable so long as you are absolutely sure that > > nothing is ever going to mess with that glock, we erred on the side of > > more caution not knowing whether it would be guaranteed safe or not. > > > > Thanks, > > > > Mark > We should have a look at the history to see how that wait got added. > However the "dead" flag here means "don't touch this glock" and is there > so that we can separate the marking dead from the actual removal from > the list (which simplifies the locking during the scanning procedures) > > Steve. (snip) > > That is a bit odd. In fact that whole function looks odd. I wonder why it > > needs to wait in the first place. It should be a simple comparison here. > > If the glock is dead then nothing else should touch it, so we are safe to > > add a new one into the hash table. The wait is almost certainly a bug, > > > > Steve.
Hi, Andreas and I both did a ton of work here trying to get this right, and it was all done for the problems we had with transitioning dinodes from unlinked to free, and how the glocks beneath them were competing by way of the in-core inodes. The glocks typically outlive the inodes, but we had tons of problems with inodes coming and going, their associated glocks coming and going, and being marked dead, and the rcus underneath them. The problem is that inodes are coming and going, some with I_FREE, or I_WILL_FREE. Glocks are also coming and going, often for the same block at the same time, but two different inodes for two different files, and both inode and iopen glocks. The problems stem from remote unlinks causing "delete work" while the inodes and glocks are both being allocated and freed in rapid succession. We encountered lots of hangs and use-after-free problems. My point is not that we shouldn't fix it, but merely that we need to be VERY careful here not to reintroduce one of the countless problems we fixed. Mark's original patch seems pretty low risk to me, but if we go that route, I'd like to see a smaller timeout; 1HZ seems like a very long time. Better still to see if there's a better fix that doesn't break anything. Bob Peterson