RE: [gentoo-user] [OT] how to delete a directory tree really fast
> -Original Message- > From: Rich Freeman > Sent: Friday, October 22, 2021 12:29 PM > To: gentoo-user@lists.gentoo.org > Subject: Re: [gentoo-user] [OT] how to delete a directory tree really fast > > On Fri, Oct 22, 2021 at 3:21 PM Helmut Jarausch wrote: > > > > Is it possible to have a hard link from one subvolume to a different > > one? > > You could do a quick test, but I don't think so. I haven't used btrfs in > years but they're basically separate filesystems as far as most commands are > concerned. I don't think you can create reflinks between subvolumes either. > > The files are already reflinked by design though. You'd just make a new > snapshot and then rsync over it. Anything that doesn't change will already > share space on disk by virtue of the snapshot. Anything that does change > will only be modified on the snapshot you target with rsync. I'm not sure > why you'd want to use a hardlink - it doesn't provide the isolation you > already get from the snapshot. > > > -- > Rich So the BTRFS filesystem itself supports hardlinks and reflinks between subvolumes because it has to for writable snapshots to work correctly. The utilities, on the other hand, have not all read that memo so actually making it do what you want can sometimes be a bit frustrating. Note also that all these garbage-collected filesystems are basically doing the equivalent of "mv to-delete .deleted ; ionice -c3 rm -rf .deleted" The files all seem to disappear instantly, but you don't get your space back until the garbage collector has had a chance to grovel over all the metadata. Groveling over the metadata is the part that takes a long time for the rm command. The advantage to garbage-collected is mainly that if you need to reboot in the middle of it it will automatically pick up where it left off when the filesystem is mounted again. But yes, in the future if you're building a massive directory tree that you're planning to delete, put it in a subvolume. That lets you do all kinds of useful things with it. LMP
Re: [gentoo-user] [OT] how to delete a directory tree really fast
On Fri, Oct 22, 2021 at 3:21 PM Helmut Jarausch wrote: > > Is it possible to have a hard link from one subvolume to a different > one? You could do a quick test, but I don't think so. I haven't used btrfs in years but they're basically separate filesystems as far as most commands are concerned. I don't think you can create reflinks between subvolumes either. The files are already reflinked by design though. You'd just make a new snapshot and then rsync over it. Anything that doesn't change will already share space on disk by virtue of the snapshot. Anything that does change will only be modified on the snapshot you target with rsync. I'm not sure why you'd want to use a hardlink - it doesn't provide the isolation you already get from the snapshot. -- Rich
Re: [gentoo-user] [OT] how to delete a directory tree really fast
On 10/22/2021 06:15:58 PM, Vitor Hugo Nunes dos Santos wrote: The real solution would have been having a subvolume for the directory. Subvolume deletion on BTRFS is near instant. Same for ZFS with datasets, etc. Thanks! Is it possible to have a hard link from one subvolume to a different one? October 22, 2021 9:50 AM, "Rich Freeman" wrote: > On Fri, Oct 22, 2021 at 8:39 AM Miles Malone > wrote: > >> small files... (Certainly dont quote me here, but wasnt JFS the king >> of that back in the day? I cant quite recall) > > It is lightning fast on lizardfs due to garbage collection, but > metadata on lizardfs is expensive, requiring RAM on the master server > for every inode. I'd never use it for lots of small files. > > My lizardfs master is using 609MiB for 1,111,394 files (the bulk of > which are in snapshots, which create records for every file inside, so > if you snapshot 100k files you end up with 200k files). Figure 1kB > per file to be safe. Not a big deal if you're storing large files > (which is what I'm mostly doing). Performance isn't eye-popping > either - I have no idea how well it would work for something like a > build system where IOPS matters. For bulk storage of big stuff though > it is spectacular, and scales very well. > > Cephfs also uses delayed deletion. I have no idea how well it > performs, or what the cost of metadata is, though I suspect it is a > lot smarter about RAM requirements on the metadata server. Well, > maybe, at least in the past it wasn't all that smart about RAM > requirements on the object storage daemons. I'd seriously look at it > if doing anything new. > > Distributed filesystems tend to be garbage collected simply due to > latency. There are data integrity benefits to synchronous writes, but > there is rarely much benefit on blocking on delections, so why do it? > These filesystems already need all kinds of synchronization > capabilities due to node failures, so syncing deletions is just a > logical design. > > For conventional filesystems a log-based filesystem is naturally > garbage-collected, but those can have their own issues. > > -- > Rich
Re: [gentoo-user] [OT] how to delete a directory tree really fast
The real solution would have been having a subvolume for the directory. Subvolume deletion on BTRFS is near instant. Same for ZFS with datasets, etc. October 22, 2021 9:50 AM, "Rich Freeman" wrote: > On Fri, Oct 22, 2021 at 8:39 AM Miles Malone > wrote: > >> small files... (Certainly dont quote me here, but wasnt JFS the king >> of that back in the day? I cant quite recall) > > It is lightning fast on lizardfs due to garbage collection, but > metadata on lizardfs is expensive, requiring RAM on the master server > for every inode. I'd never use it for lots of small files. > > My lizardfs master is using 609MiB for 1,111,394 files (the bulk of > which are in snapshots, which create records for every file inside, so > if you snapshot 100k files you end up with 200k files). Figure 1kB > per file to be safe. Not a big deal if you're storing large files > (which is what I'm mostly doing). Performance isn't eye-popping > either - I have no idea how well it would work for something like a > build system where IOPS matters. For bulk storage of big stuff though > it is spectacular, and scales very well. > > Cephfs also uses delayed deletion. I have no idea how well it > performs, or what the cost of metadata is, though I suspect it is a > lot smarter about RAM requirements on the metadata server. Well, > maybe, at least in the past it wasn't all that smart about RAM > requirements on the object storage daemons. I'd seriously look at it > if doing anything new. > > Distributed filesystems tend to be garbage collected simply due to > latency. There are data integrity benefits to synchronous writes, but > there is rarely much benefit on blocking on delections, so why do it? > These filesystems already need all kinds of synchronization > capabilities due to node failures, so syncing deletions is just a > logical design. > > For conventional filesystems a log-based filesystem is naturally > garbage-collected, but those can have their own issues. > > -- > Rich
Re: [gentoo-user] [OT] how to delete a directory tree really fast
On Fri, Oct 22, 2021 at 8:39 AM Miles Malone wrote: > > small files... (Certainly dont quote me here, but wasnt JFS the king > of that back in the day? I cant quite recall) > It is lightning fast on lizardfs due to garbage collection, but metadata on lizardfs is expensive, requiring RAM on the master server for every inode. I'd never use it for lots of small files. My lizardfs master is using 609MiB for 1,111,394 files (the bulk of which are in snapshots, which create records for every file inside, so if you snapshot 100k files you end up with 200k files). Figure 1kB per file to be safe. Not a big deal if you're storing large files (which is what I'm mostly doing). Performance isn't eye-popping either - I have no idea how well it would work for something like a build system where IOPS matters. For bulk storage of big stuff though it is spectacular, and scales very well. Cephfs also uses delayed deletion. I have no idea how well it performs, or what the cost of metadata is, though I suspect it is a lot smarter about RAM requirements on the metadata server. Well, maybe, at least in the past it wasn't all that smart about RAM requirements on the object storage daemons. I'd seriously look at it if doing anything new. Distributed filesystems tend to be garbage collected simply due to latency. There are data integrity benefits to synchronous writes, but there is rarely much benefit on blocking on delections, so why do it? These filesystems already need all kinds of synchronization capabilities due to node failures, so syncing deletions is just a logical design. For conventional filesystems a log-based filesystem is naturally garbage-collected, but those can have their own issues. -- Rich
Re: [gentoo-user] [OT] how to delete a directory tree really fast
And honestly, expanding on what Rich said... Given your particular circumstances with the extensive number of hardlinks are pretty specific, I reckon you might be best off just setting up a small scale test of some options and profiling it. Converting it all to a btrfs subvolume might be a realistic option, or might take an order of magnitude more time than just waiting for it all to delete. or all of the various move tricks mentioned previously If this were an "I know I need to do this in the future, what should I do" question then you'd either put it all in a subvolume to begin with, or select the file system specifically for its speed deleting small files... (Certainly dont quote me here, but wasnt JFS the king of that back in the day? I cant quite recall) On Fri, 22 Oct 2021 at 22:29, Rich Freeman wrote: > > On Fri, Oct 22, 2021 at 7:36 AM Helmut Jarausch wrote: > > > > > > There are more than 55,000 files on some which is located > > on a BTRFS file system. > > Standard 'rm -rf' is really slow. > > > > Is there anything I can do about this? > > > > I don't have any solid suggestions as I haven't used btrfs in a while. > File deletion speed is something that is very filesystem specific, but > on most it tends to be slow. > > An obvious solution would be garbage collection, which is something > used by some filesystems but I'm not aware of any mainstream ones. > You can sort-of get that behavior by renaming a directory before > deleting it. Suppose you have a directory created by a build system > and you want to do a new build. Deleting the directory takes a long > time. So, first you rename it to something else (or move it someplace > on the same filesystem which is fast), then you kick off your build > which no longer sees the old directory, and then you can delete the > old directory slowly at your leisure. Of course, as with all garbage > collection, you need to have the spare space to hold the data while it > gets cleaned up. > > I'm not sure if btffs is any faster at deleting snapshots/reflinks > than hard links. I suspect it wouldn't be, but you could test that. > Instead of populating a directory with hard links, create a snapshot > of the directory tree, and then rsync over it/etc. The result looks > the same but is COW copies. Again, I'm not sure that btrfs will be > any faster at deleting reflinks than hard links though - they're both > similar metadata operations. I see there is a patch in the works for > rsync that uses reflinks instead of hard links to do it all in one > command. That has a lot of benefits, but again I'm not sure if it > will help with deletion. > > You could also explore other filesystems that may or may not have > faster deletion, or look to see if there is any way to optimize it on > btrfs. > > If you can spare the space, the option of moving the directory to make > it look like it was deleted will work on basically any filesystem. If > you want to further automate it you could move it to a tmp directory > on the same filesystem and have tmpreaper do your garbage collection. > Consider using ionice to run it at a lower priority, but I'm not sure > how much impact that has on metadata operations like deletion. > > -- > Rich >
Re: [gentoo-user] [OT] how to delete a directory tree really fast
On Fri, Oct 22, 2021 at 7:36 AM Helmut Jarausch wrote: > > > There are more than 55,000 files on some which is located > on a BTRFS file system. > Standard 'rm -rf' is really slow. > > Is there anything I can do about this? > I don't have any solid suggestions as I haven't used btrfs in a while. File deletion speed is something that is very filesystem specific, but on most it tends to be slow. An obvious solution would be garbage collection, which is something used by some filesystems but I'm not aware of any mainstream ones. You can sort-of get that behavior by renaming a directory before deleting it. Suppose you have a directory created by a build system and you want to do a new build. Deleting the directory takes a long time. So, first you rename it to something else (or move it someplace on the same filesystem which is fast), then you kick off your build which no longer sees the old directory, and then you can delete the old directory slowly at your leisure. Of course, as with all garbage collection, you need to have the spare space to hold the data while it gets cleaned up. I'm not sure if btffs is any faster at deleting snapshots/reflinks than hard links. I suspect it wouldn't be, but you could test that. Instead of populating a directory with hard links, create a snapshot of the directory tree, and then rsync over it/etc. The result looks the same but is COW copies. Again, I'm not sure that btrfs will be any faster at deleting reflinks than hard links though - they're both similar metadata operations. I see there is a patch in the works for rsync that uses reflinks instead of hard links to do it all in one command. That has a lot of benefits, but again I'm not sure if it will help with deletion. You could also explore other filesystems that may or may not have faster deletion, or look to see if there is any way to optimize it on btrfs. If you can spare the space, the option of moving the directory to make it look like it was deleted will work on basically any filesystem. If you want to further automate it you could move it to a tmp directory on the same filesystem and have tmpreaper do your garbage collection. Consider using ionice to run it at a lower priority, but I'm not sure how much impact that has on metadata operations like deletion. -- Rich
[gentoo-user] [OT] how to delete a directory tree really fast
Hi, Is there anything faster than rm -rf ? I'm using rync with the --link-dest= option. Since this option uses hard links extensively, both, and have to be on the same file system. Therefore, just re-making the file system anew, cannot be used. There are more than 55,000 files on some which is located on a BTRFS file system. Standard 'rm -rf' is really slow. Is there anything I can do about this? Many thanks for a hint, Helmut
Re: [gentoo-user] Anyone using www-apps/jekyll?
On Thursday, 21 October 2021 21:22:58 BST Andreas K. Huettel wrote: > Am Donnerstag, 21. Oktober 2021, 18:11:27 CEST schrieb Peter Humphrey: > > Hello list, > > > > I wanted to try this package to create a small site for myself, but I'm > > falling at the second hurdle (the first was setting package.env etc to > > pull in ruby26 as well as the currently installed ruby30). > > > > Does anyone have experience with this builder? I'd like to find out where > > I'm going wrong first. > > It's used for www.gentoo.org :) > > https://gitweb.gentoo.org/sites/www.git Well, whaddyaknow? :) -- Regards, Peter.