On 3/4/20 2:58 PM, Matthew Ahrens wrote: > > * > > Directio (Mark M) > > o > > User interface > > + > > What happens to partial-block writes that are > DIRECTIO-requested? > > # > > Nobody wants to argue against failing partial block writes > qemu is an application which uses O_DIRECT when configured in its cache=none mode (and also cache=directsync which also uses O_DSYNC): https://documentation.suse.com/sles/11-SP4/html/SLES-kvm4zseries/cha-qemu-cachemodes.html
In this use case (qemu/virtualization), the key desired benefit of O_DIRECT is that it bypasses the host's cache. I think the ideal mapping in this use case is that O_DIRECT should have the same effect as primarycache=metadata and ZFS should /not/ require recordsize-sized writes (/not/ "fail[] partial block writes"). VMs are generally writing in 512 B or 4 KiB blocks. It is almost certainly not feasible to get the VM to write in e.g. 128 KiB blocks. If the application is required to write in recordsize blocks (and assuming the application does not want to take on the read-modify-write, which I think is the case here), this would force the administrator to set the recordsize to 512 B or 4 KiB. Using a small recordsize like that is detrimental in terms of compression ratios, metadata overhead, raidz space overhead (if you use raidz), etc. A reasonable counter-argument would be that for this use case, the administrator could set the proposed option to use the current direct IO behavior (i.e. just ignore O_DIRECT) and then set primarycache=metadata. If it turns out that requiring recordsize-sized blocks is enough of a win in other use cases, at least we have a decent fallback for this use case. A further question was raised about what downsides caching has. For the virtualization use case, I've always had a particular concern. Fundamentally, caching here is using the host's RAM to speed up disk IO. For virtualization use cases, the vast majority of host RAM is allocated to guests. This leads to a capacity planning/predictability concern. Imagine I have a host with e.g. 64 GiB of RAM, I've allocated 32 GiB of RAM to guests, and everything is working fine. Can I start another guest that uses 16 GB of RAM? It seems that I should be able to, and if I'm using uncached (on the host) IO, I definitely can. However, if I'm using the host's RAM for caching, taking 16 GB away from the cache could have detrimental performance effects. In other words, I might be (inadvertently) relying upon the caching to provide acceptable IO performance. Eliminating the host cache ensures that all IO caching is occurring in the guest's RAM, which makes RAM allocation easier to understand / more predictable. -- Richard
signature.asc
Description: OpenPGP digital signature
This is a multi-part message in MIME format... ------------=_1583400146-310216-1--