On Fri, Oct 22, 2021 at 8:39 AM Miles Malone <[email protected]> wrote: > > small files... (Certainly dont quote me here, but wasnt JFS the king > of that back in the day? I cant quite recall) >
It is lightning fast on lizardfs due to garbage collection, but metadata on lizardfs is expensive, requiring RAM on the master server for every inode. I'd never use it for lots of small files. My lizardfs master is using 609MiB for 1,111,394 files (the bulk of which are in snapshots, which create records for every file inside, so if you snapshot 100k files you end up with 200k files). Figure 1kB per file to be safe. Not a big deal if you're storing large files (which is what I'm mostly doing). Performance isn't eye-popping either - I have no idea how well it would work for something like a build system where IOPS matters. For bulk storage of big stuff though it is spectacular, and scales very well. Cephfs also uses delayed deletion. I have no idea how well it performs, or what the cost of metadata is, though I suspect it is a lot smarter about RAM requirements on the metadata server. Well, maybe, at least in the past it wasn't all that smart about RAM requirements on the object storage daemons. I'd seriously look at it if doing anything new. Distributed filesystems tend to be garbage collected simply due to latency. There are data integrity benefits to synchronous writes, but there is rarely much benefit on blocking on delections, so why do it? These filesystems already need all kinds of synchronization capabilities due to node failures, so syncing deletions is just a logical design. For conventional filesystems a log-based filesystem is naturally garbage-collected, but those can have their own issues. -- Rich

