On 03/31/2016 07:53 AM, Alex Bligh wrote:
> 
> On 31 Mar 2016, at 14:02, Denis V. Lunev <d...@openvz.org> wrote:
> 
>> From: Pavel Borzenkov <pborzen...@virtuozzo.com>
>>
>> There exist some cases when a client knows that the data it is going to
>> write is all zeroes. Such cases include mirroring or backing up a device
>> implemented by a sparse file.
> 
> Useful.
> 
>> -- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE`.  SHOULD be
>> -  set to 1 if the client requires "Force Unit Access" mode of
>> -  operation.  MUST NOT be set unless transmission flags included
>> -  `NBD_FLAG_SEND_FUA`.
>> +- bit 0, `NBD_CMD_FLAG_FUA`; valid during `NBD_CMD_WRITE` and
>> +  `NBD_CMD_WRITE_ZEROES` commands.  SHOULD be set to 1 if the client 
>> requires
>> +  "Force Unit Access" mode of operation.  MUST NOT be set unless 
>> transmission
>> +  flags included `NBD_FLAG_SEND_FUA`.
> 
> Not your fault, but this should actually say "unless export flags
> included". Transmission flags would be the flags with the command.

No, we just barely renamed 'export flags' to 'transmission flags', to
represent the 16 bits sent by the server at the end of handshake phase;
these are named 'NBD_FLAG_*'.  We still use the term 'command flags'
(although maybe 'request flags' is better) for the 16 bits sent with
each request; these are named 'NBD_CMD_FLAG_*'.

So Pavel's text is correct as-is.

> 
>> +- bit 1, `NBD_CMD_MAY_TRIM`; defined by the experimental `WRITE_ZEROES`
>> +  extension; see below.
> 
> For consistency, probably useful to say here:
> 
> MUST NOT be set unless the export flags include NBD_FLAG_SEND_WRITE_ZEROES.

Elsewhere, when defining an experimental extension, the forward
reference has been as sparse as possible; so this sentence (about the
transmission flags including NBD_FLAG_SEND_WRITE_ZEROES) should appear
only in the experimental section, if it is not already there.


>>
>> +### `WRITE_ZEROES` extension
>> +
>> +There exist some cases when a client knows that the data it is going to 
>> write
>> +is all zeroes. Such cases include mirroring or backing up a device 
>> implemented
>> +by a sparse file. With current NBD command set, the client has to issue
>> +`NBD_CMD_WRITE` command with zeroed payload and transfer these zero bytes
>> +through the wire. The server has to write the data onto disk, effectively
>> +losing the sparseness.
>> +
>> +To remedy this, a `WRITE_ZEROES` extension is envisioned. This extension 
>> adds
>> +one new command and one new command flag.
>> +
>> +* `NBD_CMD_WRITE_ZEROES` (6)

Wouter recently pointed out that we explicitly do NOT want to repeat
constants in more than one location; define the value to (6) above where
you make the forward reference in the normative section, then keep the
experimental section referring to the command by name only.  Especially
useful if we end up renumbering things because we have multiple
extension proposals in flight at the moment.


>> +    If the flag `NBD_CMD_FLAG_MAY_TRIM` was set by the client in the command
>> +    flags field, the server MAY use trimming to zero out the area, but it
>> +    MUST ensure that the data reads back as zero.
>> +
> 
> Can you give an example of a situation where the client would not set this
> and it would be undesirable for the server to create a 'hole' using
> 'trim' type technology, even when the client doesn't specify it?

Yes, I can see situations where the client REQUIRES that the server
write actual zeroes, rather than trimming.  The biggest reason is that
in an environment where storage can be oversubscribed (multiple sparse
files that in name occupy more data than the underlying storage
contains), explicitly writing zeroes without punching a hole guarantees
that YOUR file has storage allocated to it (whereas if YOUR file is
trimmed, some other file can then use enough allocation to prevent you
from actually writing data in place of the hole).  Of course, the client
can still achieve this by sticking with NBD_CMD_WRITE, but that requires
more network traffic.

However, having written that, I'm thinking we have the wrong sense for
the flag.  I think it makes more sense to allow trim/hole-punching by
default (but ONLY when the server can guarantee that reads will still be
zeroes), and make the flag NBD_CMD_FLAG_NO_TRIM to explicitly specify
the cases where the server MUST NOT trim but allocate and write actual
zeroes.  I suspect that explicit allocation requests are less common,
and also less efficient; so having the default state of the flag geared
towards efficiency (both in the sense that punching holes can be faster
than writing zeroes, and that most people LIKE the storage savings of
sparse files).

> I suspect there are already some backends (e.g. ceph on qemu-nbd) which
> will effectively do a 'trim' if you write 4k of zeroes even under
> current circumstances.
> 
> IE why not always permit trimming PROVIDED the data always reads back
> as zero? This would be far simpler.
> 

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org

Attachment: signature.asc
Description: OpenPGP digital signature

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Nbd-general mailing list
Nbd-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nbd-general

Reply via email to