Thanks Bob,

I only have the aftermath logs that show the delete being long. My suspicion 
would be that the file in question was a reasonable size and fragmented 
(possibly a consequence of some of the patches we have to avoid rgrp contention 
on allocation) and this needed to acquire significant numbers of rgrp glocks in 
order to clear them.

Mark
________________________________
From: Bob Peterson <[email protected]>
Sent: Friday, 26 April 2019 16:45
To: Mark Syms <[email protected]>
CC: [email protected]
Subject: Re: [Cluster-devel] GFS2 rm can be very slow


----- Original Message -----
> One of our users has just been in touch and reported slow file deletion (in
> this case the virtual disk for a VM) which was particularly impactful in the
> case of the Citrix Hypervisor control code as we hold a number of locks
> while deleting VM virtual disks and in this case the file delete took ~40
> seconds to complete.
>
> Now, we can, and will, work around this in the hypervisor control code by
> dropping the database entries under the locks and then leave the actual file
> deletion process occur under the control of our background garbage
> collection process but it lead me to wonder whether the userspace rm
> operation couldn't do something relatively simple to the file's inode data
> and then leave the actual resource group purging to happen in the
> background? This is obviously more complex to handle and in the case where
> an rm occurs and then there is immediately a demand for blocks where the
> only blocks possibly available were assigned to the rm'd file (i.e. the fs
> was full and the file was rm'd to make space) the block allocator would need
> to wait for the cleanup to occur. Would this be something worth considering
> as a future improvement or is it just too complicated to envisage?
>
> Mark.
>
Hi Mark,

I'd let Andreas comment on this, since he was last to work on that part of gfs2,
but he's taking the day off today (back Monday). Maybe he'll comment on Monday.

We should try to find out what part of the delete process is taking all the 
time here.

After all, the unlink part of it should be relatively fast because the freeing
of the blocks is done later. If the files are really big or really fragmented,
we can sometimes spend a lot of time waiting for many rgrp glocks, at least in
the older versions of the code. The newer versions, where we got rid of the
recursive delete, should only need to lock one rgrp at a time, so that should
not be an issue.

The actual truncating of the file might take time, flushing transactions and 
such,
especially since a delete forces us to read all the indirect metadata blocks in
before freeing them. I think some versions had a broken read-ahead for that
part of the code, but Andreas would remember.

Or maybe we're waiting to grab the glock of the directory we're deleting from?
For example, maybe there's a "hot" directory that's used in read mode by lots
of processes across several nodes and we need it in rw mode to remove the 
dirent.

I suppose a finely crafted systemtap script would help figure this all out.

Also, what version of gfs2 is running slow?

Regards,

Bob Peterson

Reply via email to