----- Original Message ----- > One of our users has just been in touch and reported slow file deletion (in > this case the virtual disk for a VM) which was particularly impactful in the > case of the Citrix Hypervisor control code as we hold a number of locks > while deleting VM virtual disks and in this case the file delete took ~40 > seconds to complete. > > Now, we can, and will, work around this in the hypervisor control code by > dropping the database entries under the locks and then leave the actual file > deletion process occur under the control of our background garbage > collection process but it lead me to wonder whether the userspace rm > operation couldn't do something relatively simple to the file's inode data > and then leave the actual resource group purging to happen in the > background? This is obviously more complex to handle and in the case where > an rm occurs and then there is immediately a demand for blocks where the > only blocks possibly available were assigned to the rm'd file (i.e. the fs > was full and the file was rm'd to make space) the block allocator would need > to wait for the cleanup to occur. Would this be something worth considering > as a future improvement or is it just too complicated to envisage? > > Mark. > Hi Mark,
I'd let Andreas comment on this, since he was last to work on that part of gfs2, but he's taking the day off today (back Monday). Maybe he'll comment on Monday. We should try to find out what part of the delete process is taking all the time here. After all, the unlink part of it should be relatively fast because the freeing of the blocks is done later. If the files are really big or really fragmented, we can sometimes spend a lot of time waiting for many rgrp glocks, at least in the older versions of the code. The newer versions, where we got rid of the recursive delete, should only need to lock one rgrp at a time, so that should not be an issue. The actual truncating of the file might take time, flushing transactions and such, especially since a delete forces us to read all the indirect metadata blocks in before freeing them. I think some versions had a broken read-ahead for that part of the code, but Andreas would remember. Or maybe we're waiting to grab the glock of the directory we're deleting from? For example, maybe there's a "hot" directory that's used in read mode by lots of processes across several nodes and we need it in rw mode to remove the dirent. I suppose a finely crafted systemtap script would help figure this all out. Also, what version of gfs2 is running slow? Regards, Bob Peterson
