Repository: cassandra Updated Branches: refs/heads/cassandra-2.0 b1825e6f8 -> 782b0b616
Add type serialization formats to native protocol spec Patch by Tyler Hobbs; reviewed by Benjamin Lerer for CASSANDRA-8495 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/782b0b61 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/782b0b61 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/782b0b61 Branch: refs/heads/cassandra-2.0 Commit: 782b0b616f871c90ec6a09b2fc27bd1d2d33caa0 Parents: b1825e6 Author: Tyler Hobbs <[email protected]> Authored: Fri Feb 13 17:03:54 2015 -0600 Committer: Tyler Hobbs <[email protected]> Committed: Fri Feb 13 17:03:54 2015 -0600 ---------------------------------------------------------------------- doc/native_protocol_v1.spec | 136 +++++++++++++++++++++++++++++++++----- doc/native_protocol_v2.spec | 137 ++++++++++++++++++++++++++++++++++----- 2 files changed, 239 insertions(+), 34 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/cassandra/blob/782b0b61/doc/native_protocol_v1.spec ---------------------------------------------------------------------- diff --git a/doc/native_protocol_v1.spec b/doc/native_protocol_v1.spec index 08cb91e..bc2bb78 100644 --- a/doc/native_protocol_v1.spec +++ b/doc/native_protocol_v1.spec @@ -34,7 +34,7 @@ Table of Contents 4.2.5.5. Schema_change 4.2.6. EVENT 5. Compression - 6. Collection types + 6. Data Type Serialization Formats 7. Error codes @@ -169,8 +169,8 @@ Table of Contents To describe the layout of the frame body for the messages in Section 4, we define the following: - [int] A 4 bytes integer - [short] A 2 bytes unsigned integer + [int] A 4 byte integer + [short] A 2 byte unsigned integer [string] A [short] n, followed by n bytes representing an UTF-8 string. [long string] An [int] n, followed by n bytes representing an UTF-8 string. @@ -525,22 +525,124 @@ Table of Contents flag (see Section 2.2) is set. -6. Collection types +6. Data Type Serialization Formats - This section describe the serialization format for the collection types: - list, map and set. This serialization format is both useful to decode values - returned in RESULT messages but also to encode values for EXECUTE ones. + This sections describes the serialization formats for all CQL data types + supported by Cassandra through the native protocol. These serialization + formats should be used by client drivers to encode values for EXECUTE + messages. Cassandra will use these formats when returning values in + RESULT messages. - The serialization formats are: - List: a [short] n indicating the size of the list, followed by n elements. - Each element is [short bytes] representing the serialized element - value. - Map: a [short] n indicating the size of the map, followed by n entries. - Each entry is composed of two [short bytes] representing the key and - the value of the entry map. - Set: a [short] n indicating the size of the set, followed by n elements. - Each element is [short bytes] representing the serialized element - value. + All values are represented as [bytes] in EXECUTE and RESULT messages. + The [bytes] format includes an int prefix denoting the length of the value. + For that reason, the serialization formats described here will not include + a length component. + + For legacy compatibility reasons, note that most non-string types support + "empty" values (i.e. a value with zero length). An empty value is distinct + from NULL, which is encoded with a negative length. + + As with the rest of the native protocol, all encodings are big-endian. + +6.1. ascii + + A sequence of bytes in the ASCII range [0, 127]. Bytes with values outside of + this range will result in a validation error. + +6.2 bigint + + An eight-byte two's complement integer. + +6.3 blob + + Any sequence of bytes. + +6.4 boolean + + A single byte. A value of 0 denotes "false"; any other value denotes "true". + (However, it is recommended that a value of 1 be used to represent "true".) + +6.5 decimal + + The decimal format represents an arbitrary-precision number. It contains an + [int] "scale" component followed by a varint encoding (see section 6.17) + of the unscaled value. The encoded value represents "<unscaled>E<-scale>". + In other words, "<unscaled> * 10 ^ (-1 * <scale>)". + +6.6 double + + An eight-byte floating point number in the IEEE 754 binary64 format. + +6.7 float + + An four-byte floating point number in the IEEE 754 binary32 format. + +6.8 inet + + A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively. + +6.9 int + + A four-byte two's complement integer. + +6.10 list + + A [short] n indicating the number of elements in the list, followed by n + elements. Each element is [short bytes] representing the serialized value. + +6.11 map + + A [short] n indicating the number of key/value pairs in the map, followed by + n entries. Each entry is composed of two [short bytes] representing the key + and value. + +6.12 set + + A [short] n indicating the number of elements in the set, followed by n + elements. Each element is [short bytes] representing the serialized value. + +6.13 text + + A sequence of bytes conforming to the UTF-8 specifications. + +6.14 timestamp + + An eight-byte two's complement integer representing a millisecond-precision + offset from the unix epoch (00:00:00, January 1st, 1970). Negative values + represent a negative offset from the epoch. + +6.15 uuid + + A 16 byte sequence representing any valid UUID as defined by RFC 4122. + +6.16 varchar + + An alias of the "text" type. + +6.17 varint + + A variable-length two's complement encoding of a signed integer. + + The following examples may help implementors of this spec: + + Value | Encoding + ------|--------- + 0 | 0x00 + 1 | 0x01 + 127 | 0x7F + 128 | 0x0080 + -1 | 0xFF + -128 | 0x80 + -129 | 0xFF7F + + Note that positive numbers must use a most-significant byte with a value + less than 0x80, because a most-significant bit of 1 indicates a negative + value. Implementors should pad positive values that have a MSB >= 0x80 + with a leading 0x00 byte. + +6.18 timeuuid + + A 16 byte sequence representing a version 1 UUID as defined by RFC 4122. 7. Error codes http://git-wip-us.apache.org/repos/asf/cassandra/blob/782b0b61/doc/native_protocol_v2.spec ---------------------------------------------------------------------- diff --git a/doc/native_protocol_v2.spec b/doc/native_protocol_v2.spec index 11d380f..ef54099 100644 --- a/doc/native_protocol_v2.spec +++ b/doc/native_protocol_v2.spec @@ -37,7 +37,7 @@ Table of Contents 4.2.7. AUTH_CHALLENGE 4.2.8. AUTH_SUCCESS 5. Compression - 6. Collection types + 6. Data Type Serialization Formats 7. Result paging 8. Error codes 9. Changes from v1 @@ -186,8 +186,8 @@ Table of Contents To describe the layout of the frame body for the messages in Section 4, we define the following: - [int] A 4 bytes integer - [short] A 2 bytes unsigned integer + [int] A 4 byte integer + [short] A 2 byte unsigned integer [string] A [short] n, followed by n bytes representing an UTF-8 string. [long string] An [int] n, followed by n bytes representing an UTF-8 string. @@ -673,22 +673,125 @@ Table of Contents avaivable on some installation. -6. Collection types +6. Data Type Serialization Formats - This section describe the serialization format for the collection types: - list, map and set. This serialization format is both useful to decode values - returned in RESULT messages but also to encode values for EXECUTE ones. + This sections describes the serialization formats for all CQL data types + supported by Cassandra through the native protocol. These serialization + formats should be used by client drivers to encode values for EXECUTE + messages. Cassandra will use these formats when returning values in + RESULT messages. - The serialization formats are: - List: a [short] n indicating the size of the list, followed by n elements. - Each element is [short bytes] representing the serialized element - value. - Map: a [short] n indicating the size of the map, followed by n entries. - Each entry is composed of two [short bytes] representing the key and - the value of the entry map. - Set: a [short] n indicating the size of the set, followed by n elements. - Each element is [short bytes] representing the serialized element - value. + All values are represented as [bytes] in EXECUTE and RESULT messages. + The [bytes] format includes an int prefix denoting the length of the value. + For that reason, the serialization formats described here will not include + a length component. + + For legacy compatibility reasons, note that most non-string types support + "empty" values (i.e. a value with zero length). An empty value is distinct + from NULL, which is encoded with a negative length. + + As with the rest of the native protocol, all encodings are big-endian. + +6.1. ascii + + A sequence of bytes in the ASCII range [0, 127]. Bytes with values outside of + this range will result in a validation error. + +6.2 bigint + + An eight-byte two's complement integer. + +6.3 blob + + Any sequence of bytes. + +6.4 boolean + + A single byte. A value of 0 denotes "false"; any other value denotes "true". + (However, it is recommended that a value of 1 be used to represent "true".) + +6.5 decimal + + The decimal format represents an arbitrary-precision number. It contains an + [int] "scale" component followed by a varint encoding (see section 6.17) + of the unscaled value. The encoded value represents "<unscaled>E<-scale>". + In other words, "<unscaled> * 10 ^ (-1 * <scale>)". + +6.6 double + + An eight-byte floating point number in the IEEE 754 binary64 format. + +6.7 float + + An four-byte floating point number in the IEEE 754 binary32 format. + +6.8 inet + + A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively. + +6.9 int + + A four-byte two's complement integer. + +6.10 list + + A [short] n indicating the number of elements in the list, followed by n + elements. Each element is [short bytes] representing the serialized value. + +6.11 map + + A [short] n indicating the number of key/value pairs in the map, followed by + n entries. Each entry is composed of two [short bytes] representing the key + and value. + +6.12 set + + A [short] n indicating the number of elements in the set, followed by n + elements. Each element is [short bytes] representing the serialized value. + +6.13 text + + A sequence of bytes conforming to the UTF-8 specifications. + +6.14 timestamp + + An eight-byte two's complement integer representing a millisecond-precision + offset from the unix epoch (00:00:00, January 1st, 1970). Negative values + represent a negative offset from the epoch. + +6.15 uuid + + A 16 byte sequence representing any valid UUID as defined by RFC 4122. + +6.16 varchar + + An alias of the "text" type. + +6.17 varint + + A variable-length two's complement encoding of a signed integer. + + The following examples may help implementors of this spec: + + Value | Encoding + ------|--------- + 0 | 0x00 + 1 | 0x01 + 127 | 0x7F + 128 | 0x0080 + 129 | 0x0081 + -1 | 0xFF + -128 | 0x80 + -129 | 0xFF7F + + Note that positive numbers must use a most-significant byte with a value + less than 0x80, because a most-significant bit of 1 indicates a negative + value. Implementors should pad positive values that have a MSB >= 0x80 + with a leading 0x00 byte. + +6.18 timeuuid + + A 16 byte sequence representing a version 1 UUID as defined by RFC 4122. 7. Result paging
