Should not be applied until we have a working implementation,
in case we need to tweak things.

Demonstrates the amount of word-smithing required to promote
structured replies to non-experimental.  In many cases, I was
able to preserve entire paragraphs (but sometimes reflowed at
different indentation).

Signed-off-by: Eric Blake <>
 doc/ | 621 ++++++++++++++++++++++++++---------------------------------
 1 file changed, 273 insertions(+), 348 deletions(-)

diff --git a/doc/ b/doc/
index cd59d81..7bc65f8 100644
--- a/doc/
+++ b/doc/
@@ -182,29 +182,32 @@ required to.

 ### Transmission

-There are two message types in the transmission phase: the request,
-and the simple reply.  The phase consists of a series of transactions,
-where the client submits requests and the server sends corresponding
-replies, with a single simple reply message per request, and continues
-until either side closes the connection.
+There are three message types in the transmission phase: the request,
+the simple reply, and the structured reply chunk.  The phase consists
+of a series of transactions, where the client submits requests and the
+server sends corresponding replies, either a single simple reply or a
+series of one or more structured reply chunks delineated by a
+concluding flag.  This phase continues until either side closes the

 Replies need not be sent in the same order as requests (i.e., requests
-may be handled by the server asynchronously).  Clients SHOULD use a
-handle that is distinct from all other currently pending transactions,
-but MAY reuse handles that are no longer in flight; handles need not
-be consecutive.  In each reply message, the server MUST use the same
-value for handle as was sent by the client in the corresponding
-request.  In this way, the client can correlate which request is
-receiving a response.
+may be handled by the server asynchronously).  Where a reply consists
+of multiple structured reply chunks, the intermediate chunks MAY be
+reordered within constraints documented by the request, and the chunks
+MAY be interleaved with messages from other pending transactions.
+Clients SHOULD use a handle that is distinct from all other currently
+pending transactions, but MAY reuse handles that are no longer in
+flight; handles need not be consecutive.  In each reply message, the
+server MUST use the same value for handle as was sent by the client in
+the corresponding request.  In this way, the client can correlate
+which request is receiving a response.

 Note that it is impossible to tell by reading just the server traffic
 whether a data field of a simple reply will be present; the simple
 reply is also problematic for error handling of the `NBD_CMD_READ`
-request.  Therefore, the experimental `STRUCTURED_REPLY` extension
-creates a context-free server stream by adding an additional
-structured reply type, and documents that it is possible to have
-multiple structured reply messages (called chunks) in response to a
-single request message; see below.
+request.  Therefore, servers SHOULD support the structured reply
+extension, and "fixed newstyle" clients SHOULD use
+`NBD_OPT_STRUCTURED_REPLY` to negotiate structured replies.

 #### Request message

@@ -245,6 +248,28 @@ S: 32 bits, error (MAY be zero)
 S: 64 bits, handle  
 S: (*length* bytes of data if the request is of type `NBD_CMD_READ`)  

+#### Structured reply message chunk
+  Unless explicitly documented for a given request, a structured reply
+  MUST occupy only one message (similar to a simple reply).  However,
+  some requests document that a structured reply MAY occupy multiple
+  chunks; each chunk uses a structured reply message (all with the
+  same value for "handle"), and the `NBD_REPLY_FLAG_DONE` reply flag
+  is used to identify the final chunk.
+  A structured reply message looks as follows:
+  S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
+  S: 16 bits, flags  
+  S: 16 bits, type  
+  S: 64 bits, handle  
+  S: 32 bits, length of payload (unsigned)  
+  S: *length* bytes of payload data (if *length* is non-zero)
+  The use of *length* in the reply allows context-free division of the
+  overall server traffic into individual reply messages; the *type*
+  field describes how to further interpret the payload.
 ## Values

 This section describes the value and meaning of constants (other than
@@ -288,8 +313,14 @@ immediately after the handshake flags field in oldstyle 
   schedule I/O accesses as for a rotational medium
 - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
   `NBD_CMD_TRIM` commands
-- bit 6, `NBD_FLAG_SEND_DF`; defined by the `STRUCTURED_REPLY` extension;
-  see below.
+- bit 6, `NBD_FLAG_SEND_DF`; MUST be set to 1 if structured replies
+  have been negotiated, and MUST NOT be set otherwise; that way, the
+  client MAY reliably use this flag as a reliable witness of whether
+  to expect a simple reply or structured reply to the `NBD_CMD_READ`
+  transmission request.
+  Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request
+  flag unless this transmission flag is set.

 Clients SHOULD ignore unknown flags.

@@ -380,7 +411,27 @@ of the newstyle negotiation.


-    Defined by the experimental `STRUCTURED_REPLY` extension; see below.
+    The client wishes to use structured replies during the
+    transmission phase.  The option request has no additional data.
+    The server replies with the following:
+    - `NBD_REP_ACK`: Structured replies have been negotiated; the
+      server MUST set the `NBD_FLAG_SEND_DF` flag in all future
+      transmission flags, and MUST use structured replies to the
+      `NBD_CMD_READ` transmission request.  Further extensions that
+      use structured replies may now be negotiated.
+    - For backwards compatibility, clients should be prepared to also
+      handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
+      will be sent.
+    It is envisioned that future extensions will add other new
+    requests that also require a data payload in the reply.  Such
+    extensions MUST use a structured reply, and not a simple reply.  A
+    server that supports such extensions MUST NOT advertise those
+    extensions until the client negotiates structured replies; and a
+    client MUST NOT make use of those extensions without first
+    enabling the `NBD_OPT_STRUCTURED_REPLY` extension.

 #### Option reply types

@@ -481,8 +532,13 @@ valid may depend on negotiation during the handshake phase.
   set to 1 if the client requires "Force Unit Access" mode of
   operation.  MUST NOT be set unless transmission flags included
-- bit 1, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY`
-  extension; see below
+- bit 1, `NBD_CMD_FLAG_DF`; valid during `NBD_CMD_READ`.  The "don't
+  fragment" bit.  SHOULD be set to 1 if the client requires the server
+  to send at most one data chunk in reply.  MUST NOT be set unless the
+  transmission flags include `NBD_FLAG_SEND_DF`.  Use of this flag MAY
+  trigger an `EOVERFLOW` error chunk, if the request length is too
+  large.

 #### Request types

@@ -490,10 +546,11 @@ The following request types exist:

 * `NBD_CMD_READ` (0)

-    A read request. Length and offset define the data to be read. The
-    server MUST reply with a simple reply header, followed immediately
-    by len bytes of data, read from offset bytes into the file, unless
-    an error condition has occurred.
+    A read request. Length and offset define the data to be read. If
+    structured replies have not been negotiated, the server MUST reply
+    with a simple reply header, followed immediately by len bytes of
+    data, read from offset bytes into the file, unless an error
+    condition has occurred.

     If an error occurs, the server SHOULD set the appropriate error code
     in the error field. The server MUST then either close the
@@ -504,10 +561,79 @@ The following request types exist:
     signalling no error), the server MUST immediately close the
     connection; it MUST NOT send any further data to the client.

-    The experimental `STRUCTURED_REPLY` extension changes from a
-    simple reply to a structured reply, in part to allow recovery
-    after a partial read and more efficient reads of sparse files; see
-    below.
+    If structured replies are negotiated, then a read request MUST
+    result in a structured reply that MAY contain one or more chunks
+    (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with
+    the following additional constraints.
+    The server MAY split the reply into any number of data chunks
+    (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and
+    `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least
+    one byte, although to minimize overhead, the server SHOULD use
+    chunks where lengths and offsets are an integer multiple of 512
+    bytes, where possible (the first and last chunk of an unaligned
+    read being the most obvious place for an exception).  The server
+    MUST NOT send data chunks that overlap each other or any earlier
+    error chunks, and MUST NOT send chunks that describe data outside
+    the offset and length of the request, but MAY send the chunks in
+    any order (the client MUST reassemble data chunks into the correct
+    order), and MAY send additional data chunks even after reporting
+    an error chunk.  Note that a request for more than 2^32 - 8 bytes
+    MUST be split into at least two chunks, so as not to overflow the
+    length field of a reply while still allowing space for the offset
+    of each chunk.  When no error is detected, the server MUST send
+    enough data chunks to cover the entire region described by the
+    offset and length of the client's request.
+    To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE`
+    on the final data chunk (in which case it MUST NOT send any
+    further non-data chunks), but MUST NOT do so if it would still be
+    possible to detect an error while transmitting the chunk.  If the
+    last data chunk is not the final reply, the server MUST send a
+    final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag
+    `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error
+    chunk.
+    If an error is detected, the server MUST still complete the
+    transmission of any current chunk (it SHOULD use padding bytes of
+    zero for any remaining data portion of
+    `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks.
+    The server MUST include an error chunk as one of the subsequent
+    chunks, but MAY defer the error reporting behind other queued
+    chunks.  An error chunk of type `NBD_REPLY_TYPE_ERROR` implies
+    that the client MAY NOT make any assumptions about validity of
+    data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as
+    the final chunk, or be immediately followed by a chunk of type
+    `NBD_REPLY_TYPE_NONE`.  On the other hand, an error chunk of type
+    `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about
+    which earlier data chunk(s) encountered a failure, and MAY also be
+    sent in lieu of a data chunk; as such, a server MAY still usefully
+    follow it with further data chunks or further error offsets.
+    Generally, a server SHOULD NOT mix errors with offsets with a
+    generic error.  As long as all errors are accompanied by offsets,
+    the client MAY assume that any data chunks with no subsequent
+    error are valid, that chunks with errors are valid up until the
+    reported offset, and portions of the read that do not have a
+    corresponding data chunk are not valid.  If the final data or
+    error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then
+    the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to
+    complete the reply, but the client MUST NOT treat this type as
+    success if an earlier data chunk was sent.
+    A client MAY close the connection if it detects that the server
+    has sent invalid chunks (such as overlapping data, or not enough
+    data before claiming success).
+    In order to avoid the burden of reassembly, the client MAY set the
+    `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not
+    fragment the reply.  If this flag is set, the server MUST send at
+    most one data chunk, although it MAY still send multiple chunks
+    (the remaining chunks would be error chunks or a final type of
+    `NBD_REPLY_TYPE_NONE`).  A server MAY reject a client's request
+    with the error `EOVERFLOW` if the length is too large to send
+    without fragmentation, in which case it MUST NOT send a data
+    chunk; however, the server MUST NOT use this error if the client's
+    requested length does not exceed 65,536 bytes.

 * `NBD_CMD_WRITE` (1)

@@ -574,6 +700,114 @@ The following request types exist:
     Currently one such message is known: `NBD_CMD_CACHE`, with type set to
     5, implemented by xnbd.

+#### Structured reply flags
+    This field of 16 bits is sent by the server as part of every
+    structured reply.
+    - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if
+      more structured reply chunks will be sent for the same client
+      request, and MUST set this bit if this is the final reply.  This
+      flag must always be set in response to requests which are
+      documented as using a structured reply, but not documented as
+      permitting multiple chunks.
+    The server MUST NOT set any other flags without first negotiating
+    the extension with the client.  Clients that receive an
+    unrecognized flag SHOULD close the connection.
+#### Structured reply types
+    These values are used in the "type" field of a structured reply.
+    Each type determines how to interpret the "length" bytes of
+    payload.  If the client receives an unknown or unexpected type, it
+    SHOULD close the connection.
+    - `NBD_REPLY_TYPE_NONE` (0)
+      *length* MUST be 0 (and the payload field omitted).  This type
+       MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set
+       (that is, it is only useful as the final reply chunk).  If no
+       earlier error chunks were sent, then this type implies that the
+       overall client request is successful.
+      [option #A1]
+      Valid as a reply to `NBD_CMD_READ`.
+      [option #A2]
+      Valid as a reply to any request.
+      This reply type represents an error chunk.  *length* MUST be
+      exactly 4.  The payload is structured as:
+      32 bits: error (MUST be nonzero)  
+      This reply represents that an error occurred, and the client MAY
+      NOT make any assumptions about partial success. This type SHOULD
+      NOT be used unless it is the final reply chunk (where the flag
+      `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed
+      by a chunk with type `NBD_REPLY_TYPE_NONE`.
+      [option #A1]
+      Valid as a reply to `NBD_CMD_READ`.
+      [option #A2]
+      Valid as a reply to any request.
+      This reply type represents an error chunk.  *length* MUST be
+      exactly 12.  The payload is structured as:
+      32 bits: error (MUST be nonzero)  
+      64 bits: offset (unsigned)  
+      In addition to declaring that an error occurred, this type
+      provides enough additional information to inform the client
+      about any partial success.  *offset* MUST lie within the bounds
+      of the original offset and length of the client's request.  If
+      *offset* also lies within the bounds of an earlier data chunk of
+      the same reply, then the client MAY assume that data within that
+      earlier chunk is valid (while the rest of that chunk MAY be
+      bogus).  Any later data chunks of the same reply MUST NOT
+      contain the offset of this chunk.
+      Valid as a reply to `NBD_CMD_READ`.
+      This reply type represents a data chunk.  *length* MUST be at
+      least 9.  The payload is structured as:
+      64 bits: offset (unsigned)  
+      *length - 8* bytes: data  
+      This reply represents the contents of *length - 8* bytes of the
+      file, starting at *offset*.  The data MUST lie within the bounds
+      of the original offset and length of the client's request, and
+      MUST NOT overlap with any earlier data or error chunks of the
+      same reply.
+      Valid as a reply to `NBD_CMD_READ`.
+      This reply type represents a data chunk.  *length* MUST be
+      exactly 12.  The payload is structured as:
+      64 bits: offset (unsigned)  
+      32 bits: hole size (unsigned)  
+      This reply represents that *hole size* bytes of the file (which
+      MUST be non-zero), starting at *offset*, read as all zeroes.
+      The hole MUST lie within the bounds of the original offset and
+      length of the client's request, and MUST NOT overlap with any
+      earlier data or error chunks of the same reply.
+      Valid as a reply to `NBD_CMD_READ`.
 #### Error values

 The error values are used for the error field in the reply message.
@@ -594,16 +828,22 @@ The following error values are defined:
 * `ENOMEM` (12), Cannot allocate memory.
 * `EINVAL` (22), Invalid argument.
 * `ENOSPC` (28), No space left on device.
-* `EOVERFLOW` (75), Value too large; MUST NOT be sent outside of the
-  experimental `STRUCTURED_REPLY` extension; see below.
+* `EOVERFLOW` (75), Value too large.

 The server SHOULD return `ENOSPC` if it receives a write request
 including one or more sectors beyond the size of the device.  It SHOULD
 return `EINVAL` if it receives a read or trim request including one or
 more sectors beyond the size of the device.  It also SHOULD map the
-`EDQUOT` and `EFBIG` errors to `ENOSPC`.  Finally, it SHOULD return
+`EDQUOT` and `EFBIG` errors to `ENOSPC`.  It SHOULD return
 `EPERM` if it receives a write or trim request on a read-only export.

+The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
+client has requested `NBD_CMD_FLAG_DF` for a length that is too large
+to read without fragmentation.  The server SHOULD NOT return this error
+for a simple reply, MUST NOT return this on a read request that did
+not exceed 65,536 bytes, and SHOULD NOT return this error if
+`NBD_CMD_FLAG_DF` is not set.
 The server SHOULD return `EINVAL` if it receives an unknown command.

 The server SHOULD return `EINVAL` if it receives an unknown command flag. It
@@ -696,321 +936,6 @@ option reply type.
       message if they do not also send it as a reply to the
       `NBD_OPT_SELECT` message.

-### `STRUCTURED_REPLY` extension
-Some of the major downsides of the default simple reply to
-`NBD_CMD_READ` are as follows.  First, it is not possible to support
-partial reads (the command must succeed or fail as a whole, either len
-bytes of data must be sent or the connection must be closed).  There
-is no way to efficiently skip over portions of a sparse file that are
-known to contain all zeroes.  Finally, it is not possible to reliably
-decode the server traffic without also having context of what pending
-read requests were sent by the client.
-To remedy this, a `STRUCTURED_REPLY` extension is envisioned. This
-extension adds a new option request, a new transmission flag, a new
-reply type during the transmission phase, a new command flag, a new
-command error, and alters the reply to the `NBD_CMD_READ` request.
-    The client wishes to use structured replies during the
-    transmission phase.  The option request has no additional data.
-    The server replies with the following:
-    - `NBD_REP_ACK`: Structured replies have been negotiated; the server
-      MUST set the `NBD_FLAG_SEND_DF` flag in all future transmission
-      flags, and MUST use structured replies to the `NBD_CMD_READ`
-      transmission request.  Further extensions that use structured
-      replies may now be negotiated.
-    - For backwards compatibility, clients should be prepared to also
-      handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies
-      will be sent.
-    It is envisioned that future extensions will add other new
-    requests that also require a data payload in the reply.  Such
-    extensions MUST use a structured reply, and not a simple reply.  A
-    server that supports such extensions MUST NOT advertise those
-    extensions until the client negotiates structured replies; and a
-    client MUST NOT make use of those extensions without first
-    enabling the `NBD_OPT_STRUCTURED_REPLY` extension.
-    [option #B1 - transmission flags always mirror current state;
-    state change can be observed if negotiation happens after
-    The server MUST set this transmission flag to 1 if structured
-    replies have been negotiated, and MUST NOT set this flag
-    otherwise; that way, the client MAY reliably use this flag as a
-    reliable witness of whether to expect a simple reply or structured
-    reply to the `NBD_CMD_READ` transmission request.
-    [option #B2 - final transmission flags are accurate, but
-    intermediate transmission flags can anticipate negotiation; state
-    change can be observed if negotiation does not happen]
-    When responding to the `NBD_OPT_EXPORT_NAME` option request (or
-    the `NBD_OPT_SELECT` request of the experimental `SELECT`
-    extension), the server MUST set this transmission flag to 1 if
-    structured replies have been negotiated, and MUST NOT set this
-    flag otherwise; that way, the client MAY reliably use the final
-    state of this flag as a reliable witness of whether to expect a
-    simple reply or structured reply to the `NBD_CMD_READ`
-    transmission request.  When responding to the `NBD_OPT_LIST`
-    option request, the server MAY set this transmission flag, even if
-    structured replies have not yet been negotiated.
-    [all options]
-    Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request
-    flag unless this transmission flag is set.
-* Transmission phase
-    The transmission phase includes a third message type: the
-    structured reply, to be used for commands where the response must
-    include a data payload.  The server MUST NOT send this reply type
-    unless the client has successfully negotiated structured replies
-    via `NBD_OPT_STRUCTURED_REPLY`.  Conversely, the server MUST NOT
-    use a simple reply for `NBD_CMD_READ` if structured replies are
-    negotiated.
-    [option #A1, but not #A2 or #A3]
-    The server MUST NOT use structured replies for requests that never
-    require a data payload in the response.
-    Unless explicitly documented for a given request, a structured
-    reply MUST occupy only one message (similar to a simple reply).
-    However, some requests document that a structured reply MAY occupy
-    multiple chunks; each chunk uses a structured reply message (all
-    with the same value for "handle"), and the `NBD_REPLY_FLAG_DONE`
-    reply flag is used to identify the final chunk.  Where multiple
-    chunks are permitted, the intermediate chunks MAY be reordered
-    within constraints documented by the request, and the chunks MAY
-    be interleaved with messages from other pending transactions; but
-    the final chunk MUST always end the reply.
-    A structured reply message looks as follows:
-    S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`)  
-    S: 16 bits, flags  
-    S: 16 bits, type  
-    S: 64 bits, handle  
-    S: 32 bits, length of payload (unsigned)  
-    S: *length* bytes of payload data (if *length* is non-zero)
-    The use of *length* in the reply allows context-free division of
-    the overall server traffic into individual reply messages; the
-    *type* field describes how to further interpret the payload.
-  * Structured reply flags
-    This field of 16 bits is sent by the server as part of every
-    structured reply.
-    - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if
-      more structured reply chunks will be sent for the same client
-      request, and MUST set this bit if this is the final reply.  This
-      flag must always be set in response to requests which are
-      documented as using a structured reply, but not documented as
-      permitting multiple chunks.
-    The server MUST NOT set any other flags without first negotiating
-    the extension with the client.  Clients that receive an
-    unrecognized flag SHOULD close the connection.
-  * Structured Reply types
-    These values are used in the "type" field of a structured reply.
-    Each type determines how to interpret the "length" bytes of
-    payload.  If the client receives an unknown or unexpected type, it
-    SHOULD close the connection.
-    - `NBD_REPLY_TYPE_NONE` (0)
-      *length* MUST be 0 (and the payload field omitted).  This type
-       MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set
-       (that is, it is only useful as the final reply chunk).  If no
-       earlier error chunks were sent, then this type implies that the
-       overall client request is successful.
-      [option #A1]
-      Valid as a reply to `NBD_CMD_READ`.
-      [option #A2]
-      Valid as a reply to any request.
-      This reply type represents an error chunk.  *length* MUST be
-      exactly 4.  The payload is structured as:
-      32 bits: error (MUST be nonzero)  
-      This reply represents that an error occurred, and the client MAY
-      NOT make any assumptions about partial success. This type SHOULD
-      NOT be used unless it is the final reply chunk (where the flag
-      `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed
-      by a chunk with type `NBD_REPLY_TYPE_NONE`.
-      [option #A1]
-      Valid as a reply to `NBD_CMD_READ`.
-      [option #A2]
-      Valid as a reply to any request.
-      This reply type represents an error chunk.  *length* MUST be
-      exactly 12.  The payload is structured as:
-      32 bits: error (MUST be nonzero)  
-      64 bits: offset (unsigned)  
-      In addition to declaring that an error occurred, this type
-      provides enough additional information to inform the client
-      about any partial success.  *offset* MUST lie within the bounds
-      of the original offset and length of the client's request.  If
-      *offset* also lies within the bounds of an earlier data chunk of
-      the same reply, then the client MAY assume that data within that
-      earlier chunk is valid (while the rest of that chunk MAY be
-      bogus).  Any later data chunks of the same reply MUST NOT
-      contain the offset of this chunk.
-      Valid as a reply to `NBD_CMD_READ`.
-      This reply type represents a data chunk.  *length* MUST be at
-      least 9.  The payload is structured as:
-      64 bits: offset (unsigned)  
-      *length - 8* bytes: data  
-      This reply represents the contents of *length - 8* bytes of the
-      file, starting at *offset*.  The data MUST lie within the bounds
-      of the original offset and length of the client's request, and
-      MUST NOT overlap with any earlier data or error chunks of the
-      same reply.
-      Valid as a reply to `NBD_CMD_READ`.
-      This reply type represents a data chunk.  *length* MUST be
-      exactly 12.  The payload is structured as:
-      64 bits: offset (unsigned)  
-      32 bits: hole size (unsigned)  
-      This reply represents that *hole size* bytes of the file (which
-      MUST be non-zero), starting at *offset*, read as all zeroes.
-      The hole MUST lie within the bounds of the original offset and
-      length of the client's request, and MUST NOT overlap with any
-      earlier data or error chunks of the same reply.
-      Valid as a reply to `NBD_CMD_READ`.
-    The "don't fragment" bit, valid during `NBD_CMD_READ`.  SHOULD be
-    set to 1 if the client requires the server to send at most one
-    data chunk in reply.  MUST NOT be set unless the transmission
-    flags include `NBD_FLAG_SEND_DF`.  Use of this flag MAY trigger an
-    `EOVERFLOW` error chunk, if the request length is too large.
-    The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a
-    client has requested `NBD_CMD_FLAG_DF` for a length that is too
-    large to read without fragmentation.  The server MUST NOT return
-    this error if the read request did not exceed 65,536 bytes, and
-    SHOULD NOT return this error if `NBD_CMD_FLAG_DF` is not set.
-    If structured replies were not negotiated, then a read request
-    MUST always be answered by a simple reply, as documented above
-    (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing
-    length bytes of data according to the client's request, although
-    those bytes MAY be invalid if an error is returned, and the
-    connection MUST be closed if an error occurs after a header
-    claiming no error).
-    If structured replies are negotiated, then a read request MUST
-    result in a structured reply that MAY contain one or more chunks
-    (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with
-    the following additional constraints.
-    The server MAY split the reply into any number of data chunks
-    (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and
-    `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least
-    one byte, although to minimize overhead, the server SHOULD use
-    chunks where lengths and offsets are an integer multiple of 512
-    bytes, where possible (the first and last chunk of an unaligned
-    read being the most obvious place for an exception).  The server
-    MUST NOT send data chunks that overlap each other or any earlier
-    error chunks, and MUST NOT send chunks that describe data outside
-    the offset and length of the request, but MAY send the chunks in
-    any order (the client MUST reassemble data chunks into the correct
-    order), and MAY send additional data chunks even after reporting
-    an error chunk.  Note that a request for more than 2^32 - 8 bytes
-    MUST be split into at least two chunks, so as not to overflow the
-    length field of a reply while still allowing space for the offset
-    of each chunk.  When no error is detected, the server MUST send
-    enough data chunks to cover the entire region described by the
-    offset and length of the client's request.
-    To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE`
-    on the final data chunk (in which case it MUST NOT send any
-    further non-data chunks), but MUST NOT do so if it would still be
-    possible to detect an error while transmitting the chunk.  If the
-    last data chunk is not the final reply, the server MUST send a
-    final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag
-    `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error
-    chunk.
-    If an error is detected, the server MUST still complete the
-    transmission of any current chunk (it SHOULD use padding bytes of
-    zero for any remaining data portion of
-    `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks.
-    The server MUST include an error chunk as one of the subsequent
-    chunks, but MAY defer the error reporting behind other queued
-    chunks.  An error chunk of type `NBD_REPLY_TYPE_ERROR` implies
-    that the client MAY NOT make any assumptions about validity of
-    data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as
-    the final chunk, or be immediately followed by a chunk of type
-    `NBD_REPLY_TYPE_NONE`.  On the other hand, an error chunk of type
-    `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about
-    which earlier data chunk(s) encountered a failure, and MAY also be
-    sent in lieu of a data chunk; as such, a server MAY still usefully
-    follow it with further data chunks or further error offsets.
-    Generally, a server SHOULD NOT mix errors with offsets with a
-    generic error.  As long as all errors are accompanied by offsets,
-    the client MAY assume that any data chunks with no subsequent
-    error are valid, that chunks with errors are valid up until the
-    reported offset, and portions of the read that do not have a
-    corresponding data chunk are not valid.  If the final data or
-    error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then
-    the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to
-    complete the reply, but the client MUST NOT treat this type as
-    success if an earlier data chunk was sent.
-    A client MAY close the connection if it detects that the server
-    has sent invalid chunks (such as overlapping data, or not enough
-    data before claiming success).
-    In order to avoid the burden of reassembly, the client MAY set the
-    `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not
-    fragment the reply.  If this flag is set, the server MUST send at
-    most one data chunk, although it MAY still send multiple chunks
-    (the remaining chunks would be error chunks or a final type of
-    `NBD_REPLY_TYPE_NONE`).  A server MAY reject a client's request
-    with the error `EOVERFLOW` if the length is too large to send
-    without fragmentation, in which case it MUST NOT send a data
-    chunk; however, the server MUST NOT use this if error the client's
-    requested length does not exceed 65,536 bytes.
 ## About this file

 This file tries to document the NBD protocol as it is currently

Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
Nbd-general mailing list

Reply via email to