On Apr 11, 2019, at 1:27 PM, James K. Lowden <jklow...@schemamania.org> wrote:
> 
> On Wed, 10 Apr 2019 15:14:59 -0600
> Warren Young <war...@etr-usa.com> wrote:
> 
>> If you?re going to buy some more storage, you should put ZFS on it
>> then, too. :)
> 
> That's interesting advice for a DBMS mailing list.  
> 
> ZFS has built-in transactions, of a sort.

ZFS was in fact designed by former database guys for a company that was heavily 
into the database market at the time, enough so that it is no surprise that 
they were eventually bought by a database company.

So yeah, ZFS and databases go together pretty well. :)

The main thing to watch out for is that your ZFS block size be the same as the 
page size of your database, or whatever it calls the equivalent structure.  
Otherwise, you end up with a write amplification problem similar to that 
created by the 512B/4kB mess that happened with mass storage several years ago.

That’s an easy thing to achieve with ZFS, because it lets you set the block 
size on a per filesystem basis, and ZFS filesystems are nearly cost-free to 
create within a ZFS pool.  You can even change the block size after the 
filesystem is created: the old files are still marked with the block size they 
were created with, and new files get the new block size.

Compression is also a per-filesystem attribute, changeable after the FS is 
created, so you can say “compress *these* things, but not these *other* things.”

> There's enough mediation in
> the filesystem to frustrate the efforts of the DBMS to make sure that
> what's committed in the transaction is, in fact, committed to the
> disk.

Sure, but what *is* on the disk after a crash is always consistent with ZFS, so 
any decent database engine can recover.

And if by some chance the DB engine + ZFS managed to lose enough data 
consistency that they cannot recover, you can roll back to a recent snapshot, 
which is cheap to create and easy to automate.

I’ve even heard of people successfully using ZFS snapshots to make live, 
continuous DB replications from one site to another for fast failover.

> It's really not the ideal substrate for a system that takes its
> fsyncs seriously.

You know, I’ve just realized that it’s been a really long time since I’ve heard 
anyone seriously talk about running databases on raw storage.  It calls into 
question how important, relatively speaking, lack of mediation is in system 
storage design.

Of course raw storage isn’t the main alternative to ZFS.  It’s LVM+md+XFS and 
similar lash-ups, which are even worse in this regard.
_______________________________________________
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to