I use ZFS over VOD because I’m more familiar with it and it suites my use case 
better. I got similar results from performance tests, with VOD outperforming 
writes slight and ZFS outperforming reads. That was before I added some ZIL and 
cache to my ZFS disks, too. I also don’t like that you have to specify 
estimated sizes with VOD for compression, I prefer the ZFS approach. Don’t 
forget to set the appropriate zfs attributes, the parts of the Gluster doc with 
those are still valid.

Few more comments inline:

> On Apr 16, 2019, at 5:09 PM, Cody Hill <[email protected]> wrote:
> 
> Hey folks.
> 
> I’m looking to deploy GlusterFS to host some VMs. I’ve done a lot of reading 
> and would like to implement Deduplication and Compression in this setup. My 
> thought would be to run ZFS to handle the Compression and Deduplication.
> 
> ZFS would give me the following benefits:
> 1. If a single disk fails rebuilds happen locally instead of over the network

I actually run mine in a pure stripe for best performance, if a disk fails and 
smart warnings didn’t give me enough time to replace it inline first, I’ll 
rebuild over the network. I have 10G of course, and currently < 10TB of data so 
I consider it reasonable. I also decided I’d rather present one large brick 
over many smaller bricks, in some tests others have done, it has shown benefits 
for gluster healing.

> 2. Zil & L2Arc should add a slight performance increase

Yes. Get the absolute fasted ZIL you can, but any modern enterprise SSD will 
still give you some benefits. Over-provision these, you probably need 4-15Gb 
for the Zil (1G networking vs 10G), and I use 90% of the cache drive to allow 
the SSD to work it’s best. Cache effectiveness depends on your workload, so 
monitor and/or test with/without.

> 3. Deduplication and Compression are inline and have pretty good performance 
> with modern hardware (Intel Skylake)

LZ4 compression is great. As others have said, I’d avoid deduplication 
altogether. Especially in a gluster environment, why waste the RAM and do the 
work multiple times?

> 4. Automated Snapshotting

Be careful doing this “underneath” the gluster layer, you’re snapshotting only 
that replica and it’s not guaranteed to be in sync with the others. At best, 
you’re making a point in time backup of one node, maybe useful for off-system 
backups with zfs streaming, but I’d consider gluster geo-rep first. And won’t 
work at all if you are not running a pure replica.

> I can then layer GlusterFS on top to handle distribution to allow 3x Replicas 
> of my storage.
> My question is… Why aren’t more people doing this? Is this a horrible idea 
> for some reason that I’m missing? I’d be very interested to hear your 
> thoughts.
> 
> Additional thoughts:
> I’d like to use Ganesha pNFS to connect to this storage. (Any issues here?)

I’d just use glusterfs glfsapi mounts, but if you want to go NFS, sure. Make 
sure you’re ready to support Ganesha, it doesn’t seem to be as well integrated 
in the latest gluster releases. Caveat, I don’t use it myself.

> I think I’d need KeepAliveD across these 3x nodes to store in the FSTAB (Is 
> this correct?)

There are easier ways. I use a simple DNS round robin to a name (that i can put 
in the host files for the servers/clients to avoid bootstrap issues when the 
local DNS is a vm ;)), and set the backup-server option so nodes can switch 
automatically if one fails. Or you can mount localhost: with a converged 
cluster, again with backup-server options for best results. 

> I’m also thinking about creating a “Gluster Tier” of 512GB of Intel Optane 
> DIMM to really smooth out write latencies… Any issues here?

Gluster tiering is currently being dropped from support, until/unless it comes 
back, I’d use the optanes as cache/zil or just make a separate fast pool out of 
them.

_______________________________________________
Gluster-users mailing list
[email protected]
https://lists.gluster.org/mailman/listinfo/gluster-users

Reply via email to