> -----Original Message-----
> From: Daniel Verkamp [mailto:[email protected]]
> Sent: Tuesday, October 16, 2018 7:16 AM
> To: [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Liu, Changpeng <[email protected]>
> Subject: Re: [PATCH v8] virtio_blk: add discard and write zeroes support
>
> On Mon, Oct 15, 2018 at 2:27 AM Christoph Hellwig <[email protected]> wrote:
> > On Fri, Oct 12, 2018 at 02:06:28PM -0700, Daniel Verkamp wrote:
> > > From: Changpeng Liu <[email protected]>
> > >
> > > In commit 88c85538, "virtio-blk: add discard and write zeroes features
> > > to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio
> >
> > There is some issues in this spec. For one using the multiple ranges
> > also for write zeroes is rather inefficient. Write zeroes really should
> > use the same format as read and write.
>
> I wasn't involved in the writing of the spec, so I'll defer to Michael
> and Changpeng here, but I'm not sure how "set in stone" the virtio
> specification is, or if it can be updated somehow without breaking
> compatibility.
>
> I agree that Write Zeroes would be simpler to implement as a single
> LBA + length rather than a list. However, it's not really possible to
> use the same format as the regular virtio block read/write commands
> (VIRTIO_BLK_T_IN/VIRTIO_BLK_T_OUT), since the read/write commands do
> not specify a length explicitly; length is implied by the length of
> the data buffer as defined by the virtio descriptor, but a Write
> Zeroes command does not require a data buffer. At best, this could be
> a separate command mirroring the layout of struct virtio_blk_req but
> with data replaced with a length field; I'm not sure that buys much in
> the way of consistency.
Yeah, that's the consideration here.
>
> > Second the unmap flag isn't properly specified at all, as nothing
> > says the device may not unmap without the unmap flag. Please take
> > a look at the SCSI or NVMe ѕpec for some guidance.
>
> This could probably use some clarifying text in the specification, but
> given that there is nothing in the spec describing what the device
> needs to do when unmap = 0, I would assume that the device can do
> whatever it likes, as long as the blocks read back as 0s afterwards.
> Reading back 0s is required by the definition of the Write Zeroes
> command in the same virtio spec change. It would probably be good to
> clarify this and explicitly define what the device is allowed to do in
> response to both settings of the unmap bit.
>
> My understanding of the corresponding feature in NVMe (the Deallocate
> bit in the Write Zeroes command) is that the only difference between
> Deallocate = 1 and 0 is that the device "should" versus "may" (no
> "shall" on either side) deallocate the corresponding blocks, but only
> if the device supports reading 0s back after blocks are deallocated.
> If the device does not support reading 0s after deallocation, it is
> not allowed to deallocate blocks as part of a Write Zeroes command
> regardless of the setting of the Deallocate bit.
>
> Some similar wording could probably be added to the virtio spec to
> clarify the meaning of unmap, although I would prefer something that
> makes it a little clearer that the bit is only intended as a hint from
> the driver to indicate whether the device should attempt to keep
> storage allocated for the zeroed blocks, if that is indeed the
> intended behavior.
Yes, that's the original idea. Adding a clear description to the specification
may be better.
>
> Is there some in-kernel doc that describes what behavior the Linux
> block layer needs from a write zeroes command?
>
> > > +static inline int virtblk_setup_discard_write_zeroes(struct request *req,
> > > + bool unmap)
> >
> > Why is this an inline function?
>
> I don't think there's any reason it needs to be inline; I can drop the
> inline in the next revision.
>
> Given (as far as I can tell) your concerns seem to apply to the Write
> Zeroes command specifically, would it be reasonable to start with a
> patch that just adds support for the Discard command (along with fixes
> for Ming's feedback)? This would be sufficient for my particular use
> case (although I can't speak for Changpeng), and we can revisit Write
> Zeroes once the spec concerns are worked out.
>
> Thanks,
> -- Daniel