> On Aug 27, 2015, at 10:46 AM, Allan Jude <allanj...@freebsd.org> wrote:
> On 2015-08-27 02:10, Marcus Reid wrote:
>> On Wed, Aug 26, 2015 at 05:25:52PM -0400, Vick Khera wrote:
>>> I'm running FreeBSD inside a VM that is providing the virtual disks backed
>>> by several ZFS zvols on the host. I want to run ZFS on the VM itself too
>>> for simplified management and backup purposes.
>>> The question I have is on the VM guest, do I really need to run a raid-z or
>>> mirror or can I just use a single virtual disk (or even a stripe)? Given
>>> that the underlying storage for the virtual disk is a zvol on a raid-z
>>> there should not really be too much worry for data corruption, I would
>>> think. It would be equivalent to using a hardware raid for each component
>>> of my zfs pool.
>>> Opinions? Preferably well-reasoned ones. :)
>> This is a frustrating situation, because none of the options that I can
>> think of look particularly appealing.  Single-vdev pools would be the
>> best option, your redundancy is already taken care of by the host's
>> pool.  The overhead of checksumming, etc. twice is probably not super
>> bad.  However, having the ARC eating up lots of memory twice seems
>> pretty bletcherous.  You can probably do some tuning to reduce that, but
>> I never liked tuning the ARC much.
>> All the nice features ZFS brings to the table is hard to give up once
>> you get used to having them around, so I understand your quandry.
>> Marcus
> You can just:
> zfs set primarycache=metadata poolname
> And it will only cache metadata in the ARC inside the VM, and avoid
> caching data blocks, which will be cached outside the VM. You could even
> turn the primarycache off entirely.
> -- 
> Allan Jude

> On Aug 27, 2015, at 1:20 PM, Paul Vixie <p...@redbarn.org> wrote:
> let me ask a related question: i'm using FFS in the guest, zvol on the
> host. should i be telling my guest kernel to not bother with an FFS
> buffer cache at all, or to use a smaller one, or what?

Whether we are talking ffs, ntfs or zpool atop zvol, unfortunately there are 
really no simple answers. You must consider your use case, the host and vm 
hardware/software configuration, perform meaningful benchmarks and, if you care 
about data integrity, thorough tests of the likely failure modes (all far more 
easily said than done). I’m curious to hear more about your use case(s) and 
setups so as to offer better insight on what alternatives may make more/less 
sense for you. Performance needs? Are you striving for lower individual latency 
or higher combined throughput? How critical are integrity and availability? How 
do you prefer your backup routine? Do you handle that in guest or host? Want 
features like dedup and/or L2ARC up in the mix? (Then everything bears 
reconsideration, just about triple your research and testing efforts.)

Sorry, I’m really not trying to scare anyone away from ZFS. It is awesome and 
capable of providing amazing solutions with very reliable and sensible behavior 
if handled with due respect, fear, monitoring and upkeep. :)

There are cases to be made for caching [meta-]data in the child, in the parent, 
checksumming in the child/parent/both, compressing in the child/parent. I 
believe `gstat` along with your custom-made benchmark or test load will greatly 
help guide you.

ZFS on ZFS seems to be a hardly studied, seldom reported, never documented, 
tedious exercise. Prepare for accelerated greying and balding of your hair. The 
parent's volblocksize, child's ashift, alignment, interactions involving raidz 
stripes (if used) can lead to problems from slightly decreased performance and 
storage efficiency to pathological write amplification within ZFS, performance 
and responsiveness crashing and sinking to the bottom of the ocean. Some 
datasets can become veritable black holes to vfs system calls. You may see ZFS 
reporting elusive errors, deadlocking or panicing in the child or parent 
altogether. With diligence though, stable and performant setups can be 
discovered for many production situations.

For example, for a zpool (whether used by a VM or not, locally, thru iscsi, 
ggate[cd], or whatever) atop zvol which sits on parent zpool with no 
redundancy, I would set primarycache=metadata checksum=off compression=off for 
the zvol(s) on the host(s) and for the most part just use the same zpool 
settings and sysctl tunings in the VM (or child zpool, whatever role it may 
conduct) that i would otherwise use on bare cpu and bare drives (defaults + 
compression=lz4 atime=off). However, that simple case is likely not yours.

With ufs/ffs/ntfs/ext4 and most other filesystems atop a zvol i use checksums 
on the parent zvol, and compression too if the child doesn’t support it (as 
ntfs can), but still caching only metadata on the host and letting the child 
vm/fs cache real data.

My use case involves charging customers for their memory use so admittedly that 
is one motivating factor, LOL. Plus, i certainly don’t want one rude VM 
marching through host ARC unfairly evacuating and starving the other polite 

VM’s swap space becomes another consideration and I treat it like any other 
‘dumb’ filesystem with compression and checksumming done by the parent but 
recent versions of many operating systems may be paging out only already 
compressed data, so investigate your guest OS. I’ve found lz4’s claims of an 
almost-no-penalty early-abort to be vastly overstated when dealing with zvols, 
small block sizes and high throughput so if you can be certain you’ll be 
dealing with only compressed data then turn it off. For the virtual memory 
pagers in most current-day OS’s though set compression on the swap’s backing 
zvol to lz4.

Another factor is the ZIL. One VM can hoard your synchronous write performance. 
Solutions are beyond the scope of this already-too-long email :) but I’d be 
happy to elaborate if queried.

And then there’s always netbooting guests from NFS mounts served by the host 
and giving the guest no virtual disks, don’t forget to consider that option.

Hope this provokes some fruitful ideas for you. Glad to philosophize about ZFS 
setups with ya’ll :)

freebsd-virtualization@freebsd.org mailing list
To unsubscribe, send any mail to 

Reply via email to