On Apr 11, 2019, at 1:27 PM, James K. Lowden <jklow...@schemamania.org> wrote: > > On Wed, 10 Apr 2019 15:14:59 -0600 > Warren Young <war...@etr-usa.com> wrote: > >> If you?re going to buy some more storage, you should put ZFS on it >> then, too. :) > > That's interesting advice for a DBMS mailing list. > > ZFS has built-in transactions, of a sort.
ZFS was in fact designed by former database guys for a company that was heavily into the database market at the time, enough so that it is no surprise that they were eventually bought by a database company. So yeah, ZFS and databases go together pretty well. :) The main thing to watch out for is that your ZFS block size be the same as the page size of your database, or whatever it calls the equivalent structure. Otherwise, you end up with a write amplification problem similar to that created by the 512B/4kB mess that happened with mass storage several years ago. That’s an easy thing to achieve with ZFS, because it lets you set the block size on a per filesystem basis, and ZFS filesystems are nearly cost-free to create within a ZFS pool. You can even change the block size after the filesystem is created: the old files are still marked with the block size they were created with, and new files get the new block size. Compression is also a per-filesystem attribute, changeable after the FS is created, so you can say “compress *these* things, but not these *other* things.” > There's enough mediation in > the filesystem to frustrate the efforts of the DBMS to make sure that > what's committed in the transaction is, in fact, committed to the > disk. Sure, but what *is* on the disk after a crash is always consistent with ZFS, so any decent database engine can recover. And if by some chance the DB engine + ZFS managed to lose enough data consistency that they cannot recover, you can roll back to a recent snapshot, which is cheap to create and easy to automate. I’ve even heard of people successfully using ZFS snapshots to make live, continuous DB replications from one site to another for fast failover. > It's really not the ideal substrate for a system that takes its > fsyncs seriously. You know, I’ve just realized that it’s been a really long time since I’ve heard anyone seriously talk about running databases on raw storage. It calls into question how important, relatively speaking, lack of mediation is in system storage design. Of course raw storage isn’t the main alternative to ZFS. It’s LVM+md+XFS and similar lash-ups, which are even worse in this regard. _______________________________________________ sqlite-users mailing list sqlite-users@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users