gene-db commented on code in PR #475:
URL: https://github.com/apache/parquet-format/pull/475#discussion_r1870179241
##########
VariantEncoding.md:
##########
@@ -88,9 +88,9 @@ metadata | header |
+-----------------------+
```
-The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a little-endian value of `offset_size` bytes, and represents the
number of string values in the dictionary.
+The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a unsigned little-endian value of `offset_size` bytes, and represents
the number of string values in the dictionary.
Next, is an `offset` list, which contains `dictionary_size + 1` values.
-Each `offset` is a little-endian value of `offset_size` bytes, and represents
the starting byte offset of the i-th string in `bytes`.
+Each `offset` is a usigned little-endian value of `offset_size` bytes, and
represents the starting byte offset of the i-th string in `bytes`.
Review Comment:
```suggestion
Each `offset` is an unsigned little-endian value of `offset_size` bytes, and
represents the starting byte offset of the i-th string in `bytes`.
```
##########
VariantEncoding.md:
##########
@@ -69,17 +69,17 @@ The entire metadata is encoded as the following diagram
shows:
metadata | header |
+-----------------------+
| |
- : dictionary_size : <-- little-endian, `offset_size` bytes
+ : dictionary_size : <-- unsigned little-endian, `offset_size`
bytes
| |
+-----------------------+
| |
- : offset : <-- little-endian, `offset_size` bytes
+ : offset : <-- unsigned little-endian,
`offset_size` bytes
Review Comment:
NIT:
```suggestion
: offset : <-- unsigned little-endian,
`offset_size` bytes
```
##########
VariantEncoding.md:
##########
@@ -88,9 +88,9 @@ metadata | header |
+-----------------------+
```
-The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a little-endian value of `offset_size` bytes, and represents the
number of string values in the dictionary.
+The metadata is encoded first with the `header` byte, then `dictionary_size`
which is a unsigned little-endian value of `offset_size` bytes, and represents
the number of string values in the dictionary.
Review Comment:
```suggestion
The metadata is encoded first with the `header` byte, then `dictionary_size`
which is an unsigned little-endian value of `offset_size` bytes, and represents
the number of string values in the dictionary.
```
##########
VariantEncoding.md:
##########
@@ -69,17 +69,17 @@ The entire metadata is encoded as the following diagram
shows:
metadata | header |
+-----------------------+
| |
- : dictionary_size : <-- little-endian, `offset_size` bytes
+ : dictionary_size : <-- unsigned little-endian, `offset_size`
bytes
| |
+-----------------------+
| |
- : offset : <-- little-endian, `offset_size` bytes
+ : offset : <-- unsigned little-endian,
`offset_size` bytes
| |
+-----------------------+
:
+-----------------------+
| |
- : offset : <-- little-endian, `offset_size` bytes
+ : offset : <-- unsigned little-endian,
`offset_size` bytes
Review Comment:
NIT:
```suggestion
: offset : <-- unsigned little-endian,
`offset_size` bytes
```
##########
VariantEncoding.md:
##########
@@ -313,10 +313,10 @@ array value_data | |
| |
+-----------------------+
```
-An array `value_data` begins with `num_elements`, a 1-byte or 4-byte
little-endian value, representing the number of elements in the array.
+An array `value_data` begins with `num_elements`, a 1-byte or 4-byte unsigned
little-endian value, representing the number of elements in the array.
The size in bytes of `num_elements` is indicated by `is_large` in the
`value_header`.
Next, is a `field_offset` list.
-There are `num_elements + 1` number of entries and each `field_offset` is a
little-endian value of `field_offset_size` bytes.
+There are `num_elements + 1` number of entries and each `field_offset` is a
unsigned little-endian value of `field_offset_size` bytes.
Review Comment:
```suggestion
There are `num_elements + 1` number of entries and each `field_offset` is an
unsigned little-endian value of `field_offset_size` bytes.
```
##########
VariantEncoding.md:
##########
@@ -254,13 +254,13 @@ object value_data | |
| |
+-----------------------+
```
-An object `value_data` begins with `num_elements`, a 1-byte or 4-byte
little-endian value, representing the number of elements in the object.
+An object `value_data` begins with `num_elements`, a 1-byte or 4-byte unsigned
little-endian value, representing the number of elements in the object.
The size in bytes of `num_elements` is indicated by `is_large` in the
`value_header`.
Next, is a list of `field_id` values.
-There are `num_elements` number of entries and each `field_id` is a
little-endian value of `field_id_size` bytes.
+There are `num_elements` number of entries and each `field_id` is a unsigned
little-endian value of `field_id_size` bytes.
A `field_id` is an index into the dictionary in the metadata.
The `field_id` list is followed by a `field_offset` list.
-There are `num_elements + 1` number of entries and each `field_offset` is a
little-endian value of `field_offset_size` bytes.
+There are `num_elements + 1` number of entries and each `field_offset` is a
unsigned little-endian value of `field_offset_size` bytes.
Review Comment:
```suggestion
There are `num_elements + 1` number of entries and each `field_offset` is an
unsigned little-endian value of `field_offset_size` bytes.
```
##########
VariantEncoding.md:
##########
@@ -254,13 +254,13 @@ object value_data | |
| |
+-----------------------+
```
-An object `value_data` begins with `num_elements`, a 1-byte or 4-byte
little-endian value, representing the number of elements in the object.
+An object `value_data` begins with `num_elements`, a 1-byte or 4-byte unsigned
little-endian value, representing the number of elements in the object.
The size in bytes of `num_elements` is indicated by `is_large` in the
`value_header`.
Next, is a list of `field_id` values.
-There are `num_elements` number of entries and each `field_id` is a
little-endian value of `field_id_size` bytes.
+There are `num_elements` number of entries and each `field_id` is a unsigned
little-endian value of `field_id_size` bytes.
Review Comment:
```suggestion
There are `num_elements` number of entries and each `field_id` is an
unsigned little-endian value of `field_id_size` bytes.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]