gene-db commented on code in PR #475: URL: https://github.com/apache/parquet-format/pull/475#discussion_r1870179241
########## VariantEncoding.md: ########## @@ -88,9 +88,9 @@ metadata | header | +-----------------------+ ``` -The metadata is encoded first with the `header` byte, then `dictionary_size` which is a little-endian value of `offset_size` bytes, and represents the number of string values in the dictionary. +The metadata is encoded first with the `header` byte, then `dictionary_size` which is a unsigned little-endian value of `offset_size` bytes, and represents the number of string values in the dictionary. Next, is an `offset` list, which contains `dictionary_size + 1` values. -Each `offset` is a little-endian value of `offset_size` bytes, and represents the starting byte offset of the i-th string in `bytes`. +Each `offset` is a usigned little-endian value of `offset_size` bytes, and represents the starting byte offset of the i-th string in `bytes`. Review Comment: ```suggestion Each `offset` is an unsigned little-endian value of `offset_size` bytes, and represents the starting byte offset of the i-th string in `bytes`. ``` ########## VariantEncoding.md: ########## @@ -69,17 +69,17 @@ The entire metadata is encoded as the following diagram shows: metadata | header | +-----------------------+ | | - : dictionary_size : <-- little-endian, `offset_size` bytes + : dictionary_size : <-- unsigned little-endian, `offset_size` bytes | | +-----------------------+ | | - : offset : <-- little-endian, `offset_size` bytes + : offset : <-- unsigned little-endian, `offset_size` bytes Review Comment: NIT: ```suggestion : offset : <-- unsigned little-endian, `offset_size` bytes ``` ########## VariantEncoding.md: ########## @@ -88,9 +88,9 @@ metadata | header | +-----------------------+ ``` -The metadata is encoded first with the `header` byte, then `dictionary_size` which is a little-endian value of `offset_size` bytes, and represents the number of string values in the dictionary. +The metadata is encoded first with the `header` byte, then `dictionary_size` which is a unsigned little-endian value of `offset_size` bytes, and represents the number of string values in the dictionary. Review Comment: ```suggestion The metadata is encoded first with the `header` byte, then `dictionary_size` which is an unsigned little-endian value of `offset_size` bytes, and represents the number of string values in the dictionary. ``` ########## VariantEncoding.md: ########## @@ -69,17 +69,17 @@ The entire metadata is encoded as the following diagram shows: metadata | header | +-----------------------+ | | - : dictionary_size : <-- little-endian, `offset_size` bytes + : dictionary_size : <-- unsigned little-endian, `offset_size` bytes | | +-----------------------+ | | - : offset : <-- little-endian, `offset_size` bytes + : offset : <-- unsigned little-endian, `offset_size` bytes | | +-----------------------+ : +-----------------------+ | | - : offset : <-- little-endian, `offset_size` bytes + : offset : <-- unsigned little-endian, `offset_size` bytes Review Comment: NIT: ```suggestion : offset : <-- unsigned little-endian, `offset_size` bytes ``` ########## VariantEncoding.md: ########## @@ -313,10 +313,10 @@ array value_data | | | | +-----------------------+ ``` -An array `value_data` begins with `num_elements`, a 1-byte or 4-byte little-endian value, representing the number of elements in the array. +An array `value_data` begins with `num_elements`, a 1-byte or 4-byte unsigned little-endian value, representing the number of elements in the array. The size in bytes of `num_elements` is indicated by `is_large` in the `value_header`. Next, is a `field_offset` list. -There are `num_elements + 1` number of entries and each `field_offset` is a little-endian value of `field_offset_size` bytes. +There are `num_elements + 1` number of entries and each `field_offset` is a unsigned little-endian value of `field_offset_size` bytes. Review Comment: ```suggestion There are `num_elements + 1` number of entries and each `field_offset` is an unsigned little-endian value of `field_offset_size` bytes. ``` ########## VariantEncoding.md: ########## @@ -254,13 +254,13 @@ object value_data | | | | +-----------------------+ ``` -An object `value_data` begins with `num_elements`, a 1-byte or 4-byte little-endian value, representing the number of elements in the object. +An object `value_data` begins with `num_elements`, a 1-byte or 4-byte unsigned little-endian value, representing the number of elements in the object. The size in bytes of `num_elements` is indicated by `is_large` in the `value_header`. Next, is a list of `field_id` values. -There are `num_elements` number of entries and each `field_id` is a little-endian value of `field_id_size` bytes. +There are `num_elements` number of entries and each `field_id` is a unsigned little-endian value of `field_id_size` bytes. A `field_id` is an index into the dictionary in the metadata. The `field_id` list is followed by a `field_offset` list. -There are `num_elements + 1` number of entries and each `field_offset` is a little-endian value of `field_offset_size` bytes. +There are `num_elements + 1` number of entries and each `field_offset` is a unsigned little-endian value of `field_offset_size` bytes. Review Comment: ```suggestion There are `num_elements + 1` number of entries and each `field_offset` is an unsigned little-endian value of `field_offset_size` bytes. ``` ########## VariantEncoding.md: ########## @@ -254,13 +254,13 @@ object value_data | | | | +-----------------------+ ``` -An object `value_data` begins with `num_elements`, a 1-byte or 4-byte little-endian value, representing the number of elements in the object. +An object `value_data` begins with `num_elements`, a 1-byte or 4-byte unsigned little-endian value, representing the number of elements in the object. The size in bytes of `num_elements` is indicated by `is_large` in the `value_header`. Next, is a list of `field_id` values. -There are `num_elements` number of entries and each `field_id` is a little-endian value of `field_id_size` bytes. +There are `num_elements` number of entries and each `field_id` is a unsigned little-endian value of `field_id_size` bytes. Review Comment: ```suggestion There are `num_elements` number of entries and each `field_id` is an unsigned little-endian value of `field_id_size` bytes. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@parquet.apache.org For additional commands, e-mail: issues-h...@parquet.apache.org