Should not be applied until we have a working implementation, in case we need to tweak things.
Demonstrates the amount of word-smithing required to promote structured replies to non-experimental. In many cases, I was able to preserve entire paragraphs (but sometimes reflowed at different indentation). Signed-off-by: Eric Blake <ebl...@redhat.com> --- doc/proto.md | 621 ++++++++++++++++++++++++++--------------------------------- 1 file changed, 273 insertions(+), 348 deletions(-) diff --git a/doc/proto.md b/doc/proto.md index cd59d81..7bc65f8 100644 --- a/doc/proto.md +++ b/doc/proto.md @@ -182,29 +182,32 @@ required to. ### Transmission -There are two message types in the transmission phase: the request, -and the simple reply. The phase consists of a series of transactions, -where the client submits requests and the server sends corresponding -replies, with a single simple reply message per request, and continues -until either side closes the connection. +There are three message types in the transmission phase: the request, +the simple reply, and the structured reply chunk. The phase consists +of a series of transactions, where the client submits requests and the +server sends corresponding replies, either a single simple reply or a +series of one or more structured reply chunks delineated by a +concluding flag. This phase continues until either side closes the +connection. Replies need not be sent in the same order as requests (i.e., requests -may be handled by the server asynchronously). Clients SHOULD use a -handle that is distinct from all other currently pending transactions, -but MAY reuse handles that are no longer in flight; handles need not -be consecutive. In each reply message, the server MUST use the same -value for handle as was sent by the client in the corresponding -request. In this way, the client can correlate which request is -receiving a response. +may be handled by the server asynchronously). Where a reply consists +of multiple structured reply chunks, the intermediate chunks MAY be +reordered within constraints documented by the request, and the chunks +MAY be interleaved with messages from other pending transactions. +Clients SHOULD use a handle that is distinct from all other currently +pending transactions, but MAY reuse handles that are no longer in +flight; handles need not be consecutive. In each reply message, the +server MUST use the same value for handle as was sent by the client in +the corresponding request. In this way, the client can correlate +which request is receiving a response. Note that it is impossible to tell by reading just the server traffic whether a data field of a simple reply will be present; the simple reply is also problematic for error handling of the `NBD_CMD_READ` -request. Therefore, the experimental `STRUCTURED_REPLY` extension -creates a context-free server stream by adding an additional -structured reply type, and documents that it is possible to have -multiple structured reply messages (called chunks) in response to a -single request message; see below. +request. Therefore, servers SHOULD support the structured reply +extension, and "fixed newstyle" clients SHOULD use +`NBD_OPT_STRUCTURED_REPLY` to negotiate structured replies. #### Request message @@ -245,6 +248,28 @@ S: 32 bits, error (MAY be zero) S: 64 bits, handle S: (*length* bytes of data if the request is of type `NBD_CMD_READ`) +#### Structured reply message chunk + + Unless explicitly documented for a given request, a structured reply + MUST occupy only one message (similar to a simple reply). However, + some requests document that a structured reply MAY occupy multiple + chunks; each chunk uses a structured reply message (all with the + same value for "handle"), and the `NBD_REPLY_FLAG_DONE` reply flag + is used to identify the final chunk. + + A structured reply message looks as follows: + + S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`) + S: 16 bits, flags + S: 16 bits, type + S: 64 bits, handle + S: 32 bits, length of payload (unsigned) + S: *length* bytes of payload data (if *length* is non-zero) + + The use of *length* in the reply allows context-free division of the + overall server traffic into individual reply messages; the *type* + field describes how to further interpret the payload. + ## Values This section describes the value and meaning of constants (other than @@ -288,8 +313,14 @@ immediately after the handshake flags field in oldstyle negotiation: schedule I/O accesses as for a rotational medium - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports `NBD_CMD_TRIM` commands -- bit 6, `NBD_FLAG_SEND_DF`; defined by the `STRUCTURED_REPLY` extension; - see below. +- bit 6, `NBD_FLAG_SEND_DF`; MUST be set to 1 if structured replies + have been negotiated, and MUST NOT be set otherwise; that way, the + client MAY reliably use this flag as a reliable witness of whether + to expect a simple reply or structured reply to the `NBD_CMD_READ` + transmission request. + + Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request + flag unless this transmission flag is set. Clients SHOULD ignore unknown flags. @@ -380,7 +411,27 @@ of the newstyle negotiation. - `NBD_OPT_STRUCTURED_REPLY` (8) - Defined by the experimental `STRUCTURED_REPLY` extension; see below. + The client wishes to use structured replies during the + transmission phase. The option request has no additional data. + + The server replies with the following: + + - `NBD_REP_ACK`: Structured replies have been negotiated; the + server MUST set the `NBD_FLAG_SEND_DF` flag in all future + transmission flags, and MUST use structured replies to the + `NBD_CMD_READ` transmission request. Further extensions that + use structured replies may now be negotiated. + - For backwards compatibility, clients should be prepared to also + handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies + will be sent. + + It is envisioned that future extensions will add other new + requests that also require a data payload in the reply. Such + extensions MUST use a structured reply, and not a simple reply. A + server that supports such extensions MUST NOT advertise those + extensions until the client negotiates structured replies; and a + client MUST NOT make use of those extensions without first + enabling the `NBD_OPT_STRUCTURED_REPLY` extension. #### Option reply types @@ -481,8 +532,13 @@ valid may depend on negotiation during the handshake phase. set to 1 if the client requires "Force Unit Access" mode of operation. MUST NOT be set unless transmission flags included `NBD_FLAG_SEND_FUA`. -- bit 1, `NBD_CMD_FLAG_DF`; defined by the experimental `STRUCTURED_REPLY` - extension; see below + +- bit 1, `NBD_CMD_FLAG_DF`; valid during `NBD_CMD_READ`. The "don't + fragment" bit. SHOULD be set to 1 if the client requires the server + to send at most one data chunk in reply. MUST NOT be set unless the + transmission flags include `NBD_FLAG_SEND_DF`. Use of this flag MAY + trigger an `EOVERFLOW` error chunk, if the request length is too + large. #### Request types @@ -490,10 +546,11 @@ The following request types exist: * `NBD_CMD_READ` (0) - A read request. Length and offset define the data to be read. The - server MUST reply with a simple reply header, followed immediately - by len bytes of data, read from offset bytes into the file, unless - an error condition has occurred. + A read request. Length and offset define the data to be read. If + structured replies have not been negotiated, the server MUST reply + with a simple reply header, followed immediately by len bytes of + data, read from offset bytes into the file, unless an error + condition has occurred. If an error occurs, the server SHOULD set the appropriate error code in the error field. The server MUST then either close the @@ -504,10 +561,79 @@ The following request types exist: signalling no error), the server MUST immediately close the connection; it MUST NOT send any further data to the client. - The experimental `STRUCTURED_REPLY` extension changes from a - simple reply to a structured reply, in part to allow recovery - after a partial read and more efficient reads of sparse files; see - below. + If structured replies are negotiated, then a read request MUST + result in a structured reply that MAY contain one or more chunks + (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with + the following additional constraints. + + The server MAY split the reply into any number of data chunks + (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and + `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least + one byte, although to minimize overhead, the server SHOULD use + chunks where lengths and offsets are an integer multiple of 512 + bytes, where possible (the first and last chunk of an unaligned + read being the most obvious place for an exception). The server + MUST NOT send data chunks that overlap each other or any earlier + error chunks, and MUST NOT send chunks that describe data outside + the offset and length of the request, but MAY send the chunks in + any order (the client MUST reassemble data chunks into the correct + order), and MAY send additional data chunks even after reporting + an error chunk. Note that a request for more than 2^32 - 8 bytes + MUST be split into at least two chunks, so as not to overflow the + length field of a reply while still allowing space for the offset + of each chunk. When no error is detected, the server MUST send + enough data chunks to cover the entire region described by the + offset and length of the client's request. + + To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE` + on the final data chunk (in which case it MUST NOT send any + further non-data chunks), but MUST NOT do so if it would still be + possible to detect an error while transmitting the chunk. If the + last data chunk is not the final reply, the server MUST send a + final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag + `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error + chunk. + + If an error is detected, the server MUST still complete the + transmission of any current chunk (it SHOULD use padding bytes of + zero for any remaining data portion of + `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks. + The server MUST include an error chunk as one of the subsequent + chunks, but MAY defer the error reporting behind other queued + chunks. An error chunk of type `NBD_REPLY_TYPE_ERROR` implies + that the client MAY NOT make any assumptions about validity of + data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as + the final chunk, or be immediately followed by a chunk of type + `NBD_REPLY_TYPE_NONE`. On the other hand, an error chunk of type + `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about + which earlier data chunk(s) encountered a failure, and MAY also be + sent in lieu of a data chunk; as such, a server MAY still usefully + follow it with further data chunks or further error offsets. + Generally, a server SHOULD NOT mix errors with offsets with a + generic error. As long as all errors are accompanied by offsets, + the client MAY assume that any data chunks with no subsequent + error are valid, that chunks with errors are valid up until the + reported offset, and portions of the read that do not have a + corresponding data chunk are not valid. If the final data or + error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then + the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to + complete the reply, but the client MUST NOT treat this type as + success if an earlier data chunk was sent. + + A client MAY close the connection if it detects that the server + has sent invalid chunks (such as overlapping data, or not enough + data before claiming success). + + In order to avoid the burden of reassembly, the client MAY set the + `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not + fragment the reply. If this flag is set, the server MUST send at + most one data chunk, although it MAY still send multiple chunks + (the remaining chunks would be error chunks or a final type of + `NBD_REPLY_TYPE_NONE`). A server MAY reject a client's request + with the error `EOVERFLOW` if the length is too large to send + without fragmentation, in which case it MUST NOT send a data + chunk; however, the server MUST NOT use this error if the client's + requested length does not exceed 65,536 bytes. * `NBD_CMD_WRITE` (1) @@ -574,6 +700,114 @@ The following request types exist: Currently one such message is known: `NBD_CMD_CACHE`, with type set to 5, implemented by xnbd. +#### Structured reply flags + + This field of 16 bits is sent by the server as part of every + structured reply. + + - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if + more structured reply chunks will be sent for the same client + request, and MUST set this bit if this is the final reply. This + flag must always be set in response to requests which are + documented as using a structured reply, but not documented as + permitting multiple chunks. + + The server MUST NOT set any other flags without first negotiating + the extension with the client. Clients that receive an + unrecognized flag SHOULD close the connection. + +#### Structured reply types + + These values are used in the "type" field of a structured reply. + Each type determines how to interpret the "length" bytes of + payload. If the client receives an unknown or unexpected type, it + SHOULD close the connection. + + - `NBD_REPLY_TYPE_NONE` (0) + + *length* MUST be 0 (and the payload field omitted). This type + MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set + (that is, it is only useful as the final reply chunk). If no + earlier error chunks were sent, then this type implies that the + overall client request is successful. + + [option #A1] + Valid as a reply to `NBD_CMD_READ`. + + [option #A2] + Valid as a reply to any request. + + - `NBD_REPLY_TYPE_ERROR` (1) + + This reply type represents an error chunk. *length* MUST be + exactly 4. The payload is structured as: + + 32 bits: error (MUST be nonzero) + + This reply represents that an error occurred, and the client MAY + NOT make any assumptions about partial success. This type SHOULD + NOT be used unless it is the final reply chunk (where the flag + `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed + by a chunk with type `NBD_REPLY_TYPE_NONE`. + + [option #A1] + Valid as a reply to `NBD_CMD_READ`. + + [option #A2] + Valid as a reply to any request. + + - `NBD_REPLY_TYPE_ERROR_OFFSET` (2) + + This reply type represents an error chunk. *length* MUST be + exactly 12. The payload is structured as: + + 32 bits: error (MUST be nonzero) + 64 bits: offset (unsigned) + + In addition to declaring that an error occurred, this type + provides enough additional information to inform the client + about any partial success. *offset* MUST lie within the bounds + of the original offset and length of the client's request. If + *offset* also lies within the bounds of an earlier data chunk of + the same reply, then the client MAY assume that data within that + earlier chunk is valid (while the rest of that chunk MAY be + bogus). Any later data chunks of the same reply MUST NOT + contain the offset of this chunk. + + Valid as a reply to `NBD_CMD_READ`. + + - `NBD_REPLY_TYPE_OFFSET_DATA` (3) + + This reply type represents a data chunk. *length* MUST be at + least 9. The payload is structured as: + + 64 bits: offset (unsigned) + *length - 8* bytes: data + + This reply represents the contents of *length - 8* bytes of the + file, starting at *offset*. The data MUST lie within the bounds + of the original offset and length of the client's request, and + MUST NOT overlap with any earlier data or error chunks of the + same reply. + + Valid as a reply to `NBD_CMD_READ`. + + - `NBD_REPLY_TYPE_OFFSET_HOLE` (4) + + This reply type represents a data chunk. *length* MUST be + exactly 12. The payload is structured as: + + 64 bits: offset (unsigned) + 32 bits: hole size (unsigned) + + This reply represents that *hole size* bytes of the file (which + MUST be non-zero), starting at *offset*, read as all zeroes. + The hole MUST lie within the bounds of the original offset and + length of the client's request, and MUST NOT overlap with any + earlier data or error chunks of the same reply. + + Valid as a reply to `NBD_CMD_READ`. + #### Error values The error values are used for the error field in the reply message. @@ -594,16 +828,22 @@ The following error values are defined: * `ENOMEM` (12), Cannot allocate memory. * `EINVAL` (22), Invalid argument. * `ENOSPC` (28), No space left on device. -* `EOVERFLOW` (75), Value too large; MUST NOT be sent outside of the - experimental `STRUCTURED_REPLY` extension; see below. +* `EOVERFLOW` (75), Value too large. The server SHOULD return `ENOSPC` if it receives a write request including one or more sectors beyond the size of the device. It SHOULD return `EINVAL` if it receives a read or trim request including one or more sectors beyond the size of the device. It also SHOULD map the -`EDQUOT` and `EFBIG` errors to `ENOSPC`. Finally, it SHOULD return +`EDQUOT` and `EFBIG` errors to `ENOSPC`. It SHOULD return `EPERM` if it receives a write or trim request on a read-only export. +The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a +client has requested `NBD_CMD_FLAG_DF` for a length that is too large +to read without fragmentation. The server SHOULD NOT return this error +for a simple reply, MUST NOT return this on a read request that did +not exceed 65,536 bytes, and SHOULD NOT return this error if +`NBD_CMD_FLAG_DF` is not set. + The server SHOULD return `EINVAL` if it receives an unknown command. The server SHOULD return `EINVAL` if it receives an unknown command flag. It @@ -696,321 +936,6 @@ option reply type. message if they do not also send it as a reply to the `NBD_OPT_SELECT` message. -### `STRUCTURED_REPLY` extension - -Some of the major downsides of the default simple reply to -`NBD_CMD_READ` are as follows. First, it is not possible to support -partial reads (the command must succeed or fail as a whole, either len -bytes of data must be sent or the connection must be closed). There -is no way to efficiently skip over portions of a sparse file that are -known to contain all zeroes. Finally, it is not possible to reliably -decode the server traffic without also having context of what pending -read requests were sent by the client. - -To remedy this, a `STRUCTURED_REPLY` extension is envisioned. This -extension adds a new option request, a new transmission flag, a new -reply type during the transmission phase, a new command flag, a new -command error, and alters the reply to the `NBD_CMD_READ` request. - -* `NBD_OPT_STRUCTURED_REPLY` - - The client wishes to use structured replies during the - transmission phase. The option request has no additional data. - - The server replies with the following: - - - `NBD_REP_ACK`: Structured replies have been negotiated; the server - MUST set the `NBD_FLAG_SEND_DF` flag in all future transmission - flags, and MUST use structured replies to the `NBD_CMD_READ` - transmission request. Further extensions that use structured - replies may now be negotiated. - - For backwards compatibility, clients should be prepared to also - handle `NBD_REP_ERR_UNSUP`; in this case, no structured replies - will be sent. - - It is envisioned that future extensions will add other new - requests that also require a data payload in the reply. Such - extensions MUST use a structured reply, and not a simple reply. A - server that supports such extensions MUST NOT advertise those - extensions until the client negotiates structured replies; and a - client MUST NOT make use of those extensions without first - enabling the `NBD_OPT_STRUCTURED_REPLY` extension. - -* `NBD_FLAG_SEND_DF` - - [option #B1 - transmission flags always mirror current state; - state change can be observed if negotiation happens after - NBD_OPT_LIST] - The server MUST set this transmission flag to 1 if structured - replies have been negotiated, and MUST NOT set this flag - otherwise; that way, the client MAY reliably use this flag as a - reliable witness of whether to expect a simple reply or structured - reply to the `NBD_CMD_READ` transmission request. - - [option #B2 - final transmission flags are accurate, but - intermediate transmission flags can anticipate negotiation; state - change can be observed if negotiation does not happen] - When responding to the `NBD_OPT_EXPORT_NAME` option request (or - the `NBD_OPT_SELECT` request of the experimental `SELECT` - extension), the server MUST set this transmission flag to 1 if - structured replies have been negotiated, and MUST NOT set this - flag otherwise; that way, the client MAY reliably use the final - state of this flag as a reliable witness of whether to expect a - simple reply or structured reply to the `NBD_CMD_READ` - transmission request. When responding to the `NBD_OPT_LIST` - option request, the server MAY set this transmission flag, even if - structured replies have not yet been negotiated. - - [all options] - Additionally, clients MUST NOT set the `NBD_CMD_FLAG_DF` request - flag unless this transmission flag is set. - -* Transmission phase - - The transmission phase includes a third message type: the - structured reply, to be used for commands where the response must - include a data payload. The server MUST NOT send this reply type - unless the client has successfully negotiated structured replies - via `NBD_OPT_STRUCTURED_REPLY`. Conversely, the server MUST NOT - use a simple reply for `NBD_CMD_READ` if structured replies are - negotiated. - - [option #A1, but not #A2 or #A3] - The server MUST NOT use structured replies for requests that never - require a data payload in the response. - - Unless explicitly documented for a given request, a structured - reply MUST occupy only one message (similar to a simple reply). - However, some requests document that a structured reply MAY occupy - multiple chunks; each chunk uses a structured reply message (all - with the same value for "handle"), and the `NBD_REPLY_FLAG_DONE` - reply flag is used to identify the final chunk. Where multiple - chunks are permitted, the intermediate chunks MAY be reordered - within constraints documented by the request, and the chunks MAY - be interleaved with messages from other pending transactions; but - the final chunk MUST always end the reply. - - A structured reply message looks as follows: - - S: 32 bits, 0x668e33ef, magic (`NBD_STRUCTURED_REPLY_MAGIC`) - S: 16 bits, flags - S: 16 bits, type - S: 64 bits, handle - S: 32 bits, length of payload (unsigned) - S: *length* bytes of payload data (if *length* is non-zero) - - The use of *length* in the reply allows context-free division of - the overall server traffic into individual reply messages; the - *type* field describes how to further interpret the payload. - - * Structured reply flags - - This field of 16 bits is sent by the server as part of every - structured reply. - - - bit 0, `NBD_REPLY_FLAG_DONE`; the server MUST clear this bit if - more structured reply chunks will be sent for the same client - request, and MUST set this bit if this is the final reply. This - flag must always be set in response to requests which are - documented as using a structured reply, but not documented as - permitting multiple chunks. - - The server MUST NOT set any other flags without first negotiating - the extension with the client. Clients that receive an - unrecognized flag SHOULD close the connection. - - * Structured Reply types - - These values are used in the "type" field of a structured reply. - Each type determines how to interpret the "length" bytes of - payload. If the client receives an unknown or unexpected type, it - SHOULD close the connection. - - - `NBD_REPLY_TYPE_NONE` (0) - - *length* MUST be 0 (and the payload field omitted). This type - MUST always be used with the `NBD_REPLY_FLAG_DONE` bit set - (that is, it is only useful as the final reply chunk). If no - earlier error chunks were sent, then this type implies that the - overall client request is successful. - - [option #A1] - Valid as a reply to `NBD_CMD_READ`. - - [option #A2] - Valid as a reply to any request. - - - `NBD_REPLY_TYPE_ERROR` (1) - - This reply type represents an error chunk. *length* MUST be - exactly 4. The payload is structured as: - - 32 bits: error (MUST be nonzero) - - This reply represents that an error occurred, and the client MAY - NOT make any assumptions about partial success. This type SHOULD - NOT be used unless it is the final reply chunk (where the flag - `NBD_REPLY_FLAG_DONE` is set), or if it is immediately followed - by a chunk with type `NBD_REPLY_TYPE_NONE`. - - [option #A1] - Valid as a reply to `NBD_CMD_READ`. - - [option #A2] - Valid as a reply to any request. - - - `NBD_REPLY_TYPE_ERROR_OFFSET` (2) - - This reply type represents an error chunk. *length* MUST be - exactly 12. The payload is structured as: - - 32 bits: error (MUST be nonzero) - 64 bits: offset (unsigned) - - In addition to declaring that an error occurred, this type - provides enough additional information to inform the client - about any partial success. *offset* MUST lie within the bounds - of the original offset and length of the client's request. If - *offset* also lies within the bounds of an earlier data chunk of - the same reply, then the client MAY assume that data within that - earlier chunk is valid (while the rest of that chunk MAY be - bogus). Any later data chunks of the same reply MUST NOT - contain the offset of this chunk. - - Valid as a reply to `NBD_CMD_READ`. - - - `NBD_REPLY_TYPE_OFFSET_DATA` (3) - - This reply type represents a data chunk. *length* MUST be at - least 9. The payload is structured as: - - 64 bits: offset (unsigned) - *length - 8* bytes: data - - This reply represents the contents of *length - 8* bytes of the - file, starting at *offset*. The data MUST lie within the bounds - of the original offset and length of the client's request, and - MUST NOT overlap with any earlier data or error chunks of the - same reply. - - Valid as a reply to `NBD_CMD_READ`. - - - `NBD_REPLY_TYPE_OFFSET_HOLE` (4) - - This reply type represents a data chunk. *length* MUST be - exactly 12. The payload is structured as: - - 64 bits: offset (unsigned) - 32 bits: hole size (unsigned) - - This reply represents that *hole size* bytes of the file (which - MUST be non-zero), starting at *offset*, read as all zeroes. - The hole MUST lie within the bounds of the original offset and - length of the client's request, and MUST NOT overlap with any - earlier data or error chunks of the same reply. - - Valid as a reply to `NBD_CMD_READ`. - -* `NBD_CMD_FLAG_DF` - - The "don't fragment" bit, valid during `NBD_CMD_READ`. SHOULD be - set to 1 if the client requires the server to send at most one - data chunk in reply. MUST NOT be set unless the transmission - flags include `NBD_FLAG_SEND_DF`. Use of this flag MAY trigger an - `EOVERFLOW` error chunk, if the request length is too large. - -* `EOVERFLOW` - - The server SHOULD return `EOVERFLOW`, rather than `EINVAL`, when a - client has requested `NBD_CMD_FLAG_DF` for a length that is too - large to read without fragmentation. The server MUST NOT return - this error if the read request did not exceed 65,536 bytes, and - SHOULD NOT return this error if `NBD_CMD_FLAG_DF` is not set. - -* `NBD_CMD_READ` - - If structured replies were not negotiated, then a read request - MUST always be answered by a simple reply, as documented above - (using magic 0x67446698 `NBD_SIMPLE_REPLY_MAGIC`, and containing - length bytes of data according to the client's request, although - those bytes MAY be invalid if an error is returned, and the - connection MUST be closed if an error occurs after a header - claiming no error). - - If structured replies are negotiated, then a read request MUST - result in a structured reply that MAY contain one or more chunks - (each using magic 0x668e33ef `NBD_STRUCTURED_REPLY_MAGIC`), with - the following additional constraints. - - The server MAY split the reply into any number of data chunks - (reply types of `NBD_REPLY_TYPE_OFFSET_DATA` and - `NBD_REPLY_TYPE_OFFSET_HOLE`); each chunk MUST describe at least - one byte, although to minimize overhead, the server SHOULD use - chunks where lengths and offsets are an integer multiple of 512 - bytes, where possible (the first and last chunk of an unaligned - read being the most obvious place for an exception). The server - MUST NOT send data chunks that overlap each other or any earlier - error chunks, and MUST NOT send chunks that describe data outside - the offset and length of the request, but MAY send the chunks in - any order (the client MUST reassemble data chunks into the correct - order), and MAY send additional data chunks even after reporting - an error chunk. Note that a request for more than 2^32 - 8 bytes - MUST be split into at least two chunks, so as not to overflow the - length field of a reply while still allowing space for the offset - of each chunk. When no error is detected, the server MUST send - enough data chunks to cover the entire region described by the - offset and length of the client's request. - - To minimize traffic, the server MAY set the `NBD_REPLY_FLAG_DONE` - on the final data chunk (in which case it MUST NOT send any - further non-data chunks), but MUST NOT do so if it would still be - possible to detect an error while transmitting the chunk. If the - last data chunk is not the final reply, the server MUST send a - final chunk with type `NBD_REPLY_TYPE_NONE` (and the flag - `NBD_REPLY_FLAG_DONE` set) to indicate success, or send an error - chunk. - - If an error is detected, the server MUST still complete the - transmission of any current chunk (it SHOULD use padding bytes of - zero for any remaining data portion of - `NBD_REPLY_TYPE_OFFSET_DATA`), but MAY omit further data chunks. - The server MUST include an error chunk as one of the subsequent - chunks, but MAY defer the error reporting behind other queued - chunks. An error chunk of type `NBD_REPLY_TYPE_ERROR` implies - that the client MAY NOT make any assumptions about validity of - data chunks, and SHOULD either have `NBD_REPLY_FLAG_DONE` set as - the final chunk, or be immediately followed by a chunk of type - `NBD_REPLY_TYPE_NONE`. On the other hand, an error chunk of type - `NBD_REPLY_TYPE_ERROR_OFFSET` gives fine-grained information about - which earlier data chunk(s) encountered a failure, and MAY also be - sent in lieu of a data chunk; as such, a server MAY still usefully - follow it with further data chunks or further error offsets. - Generally, a server SHOULD NOT mix errors with offsets with a - generic error. As long as all errors are accompanied by offsets, - the client MAY assume that any data chunks with no subsequent - error are valid, that chunks with errors are valid up until the - reported offset, and portions of the read that do not have a - corresponding data chunk are not valid. If the final data or - error chunk did not have the `NBD_REPLY_FLAG_DONE` bit set, then - the server MUST use a final `NBD_REPLY_TYPE_NONE` chunk to - complete the reply, but the client MUST NOT treat this type as - success if an earlier data chunk was sent. - - A client MAY close the connection if it detects that the server - has sent invalid chunks (such as overlapping data, or not enough - data before claiming success). - - In order to avoid the burden of reassembly, the client MAY set the - `NBD_CMD_FLAG_DF` flag (bit 1), which instructs the server to not - fragment the reply. If this flag is set, the server MUST send at - most one data chunk, although it MAY still send multiple chunks - (the remaining chunks would be error chunks or a final type of - `NBD_REPLY_TYPE_NONE`). A server MAY reject a client's request - with the error `EOVERFLOW` if the length is too large to send - without fragmentation, in which case it MUST NOT send a data - chunk; however, the server MUST NOT use this if error the client's - requested length does not exceed 65,536 bytes. - ## About this file This file tries to document the NBD protocol as it is currently -- 2.5.5 ------------------------------------------------------------------------------ Transform Data into Opportunity. Accelerate data analysis in your applications with Intel Data Analytics Acceleration Library. Click to learn more. http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140 _______________________________________________ Nbd-general mailing list Nbd-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nbd-general