Repository: parquet-format Updated Branches: refs/heads/master 863875e0b -> ddc18a7af
PARQUET-1125: Add UUID logical type. UUIDs are commonly used as unique identifiers. A binary representation will reduce memory when writing or building bloom filters and will reduce cycles needed to compare values. This commit is based on PARQUET-906 / PR #51. Author: Ryan Blue <[email protected]> Closes #71 from rdblue/PARQUET-1125-add-uuid-logical-type and squashes the following commits: dc01707 [Ryan Blue] PARQUET-1125: Add UUID logical type. Project: http://git-wip-us.apache.org/repos/asf/parquet-format/repo Commit: http://git-wip-us.apache.org/repos/asf/parquet-format/commit/ddc18a7a Tree: http://git-wip-us.apache.org/repos/asf/parquet-format/tree/ddc18a7a Diff: http://git-wip-us.apache.org/repos/asf/parquet-format/diff/ddc18a7a Branch: refs/heads/master Commit: ddc18a7af21127f9100096b5b356d1cad888d174 Parents: 863875e Author: Ryan Blue <[email protected]> Authored: Tue Oct 10 12:53:19 2017 -0700 Committer: Ryan Blue <[email protected]> Committed: Tue Oct 10 12:53:19 2017 -0700 ---------------------------------------------------------------------- LogicalTypes.md | 13 ++++++++++++- src/main/thrift/parquet.thrift | 1 + 2 files changed, 13 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/parquet-format/blob/ddc18a7a/LogicalTypes.md ---------------------------------------------------------------------- diff --git a/LogicalTypes.md b/LogicalTypes.md index c50b96b..2c80256 100644 --- a/LogicalTypes.md +++ b/LogicalTypes.md @@ -48,7 +48,18 @@ was converted from an enumerated type in another data model (e.g. Thrift, Avro, Applications using a data model lacking a native enum type should interpret `ENUM` annotated field as a UTF-8 encoded string. -The sort order used for `ENUM`s is `UNSIGNED` byte-wise comparison. +The sort order used for `ENUM` values is unsigned byte-wise comparison. + +### UUID + +`UUID` annotates a 16-byte fixed-length binary. The value is encoded using +big-endian, so that `00112233-4455-6677-8899-aabbccddeeff` is encoded as the +bytes `00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff` +(This example is from [wikipedia's UUID page][wiki-uuid]). + +The sort order used for `UUID` values is unsigned byte-wise comparison. + +[wiki-uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier ## Numeric Types http://git-wip-us.apache.org/repos/asf/parquet-format/blob/ddc18a7a/src/main/thrift/parquet.thrift ---------------------------------------------------------------------- diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift index 4c76cbd..a4e193e 100644 --- a/src/main/thrift/parquet.thrift +++ b/src/main/thrift/parquet.thrift @@ -226,6 +226,7 @@ struct Statistics { /** Empty structs to use as logical type annotations */ struct StringType {} // allowed for BINARY, must be encoded with UTF-8 +struct UUIDType {} // allowed for FIXED[16], must encoded raw UUID bytes struct MapType {} // see LogicalTypes.md struct ListType {} // see LogicalTypes.md struct EnumType {} // allowed for BINARY, must be encoded with UTF-8
