scovich commented on code in PR #523: URL: https://github.com/apache/parquet-format/pull/523#discussion_r2430310159
##########
VariantShredding.md:
##########
@@ -202,21 +202,22 @@ As a result, reads when both `value` and `typed_value`
are defined may be incons
The table below shows how the series of objects in the first column would be
stored:
-| Event object
| `value` | `typed_value` |
`typed_value.event_type.value` | `typed_value.event_type.typed_value` |
`typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes
|
-|------------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------|
-| `{"event_type": "noop", "event_ts": 1729794114937}`
| null | non-null | null
| `noop` | null
| 1729794114937 | Fully shredded object
|
-| `{"event_type": "login", "event_ts": 1729794146402, "email":
"[email protected]"}` | `{"email": "[email protected]"}` | non-null |
null | `login` | null
| 1729794146402 | Partially shredded
object |
-| `{"error_msg": "malformed: ..."}`
| `{"error_msg", "malformed: ..."}` | non-null | null
| null | null
| null | Object with all shredded fields missing
|
-| `"malformed: not an object"`
| `malformed: not an object` | null |
| |
| | Not an object (stored as Variant
string) |
-| `{"event_ts": 1729794240241, "click": "_button"}`
| `{"click": "_button"}` | non-null | null
| null | null
| 1729794240241 | Field `event_type` is missing
|
-| `{"event_type": null, "event_ts": 1729794954163}`
| null | non-null | `00` (field exists,
is null) | null | null
| 1729794954163 | Field `event_type` is present and is
null |
-| `{"event_type": "noop", "event_ts": "2024-10-24"}`
| null | non-null | null
| `noop` | `"2024-10-24"`
| null | Field `event_ts` is present but not a
timestamp |
-| `{ }`
| null | non-null | null
| null | null
| null | Object is present but empty
|
-| null
| `00` (null) | null |
| |
| | Object/value is null
|
-| missing
| null | null |
| |
| | Object/value is missing
|
-| INVALID
| `{"event_type": "login"}` | non-null | null
| `login` | null
| 1729795057774 | INVALID: Shredded field is present in
`value` |
-| INVALID
| `"a"` | non-null | null
| null | null
| null | INVALID: `typed_value` is present for
non-object |
-| INVALID
| `02 00` (object with 0 fields) | null |
| |
| | INVALID: `typed_value` is null for
object |
+| Event object
| `value` | `typed_value` |
`typed_value.event_type.value` | `typed_value.event_type.typed_value` |
`typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes
|
+|-----------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------------------------------------------------------------------|
+| `{"event_type": "noop", "event_ts": 1729794114937}`
| null | non-null | null
| `noop` | null
| 1729794114937 | Fully shredded object
|
+| `{"event_type": "login", "event_ts": 1729794146402, "email":
"[email protected]"}` | `{"email": "[email protected]"}` | non-null | null
| `login` | null
| 1729794146402 | Partially shredded
object
|
+| `{"error_msg": "malformed: ..."}`
| `{"error_msg", "malformed: ..."}` | non-null | null
| null | null
| null | Object with all shredded fields missing
|
+| `"malformed: not an object"`
| `malformed: not an object` | null |
| |
| | Not an object (stored as Variant string)
|
+| `{"event_ts": 1729794240241, "click": "_button"}`
| `{"click": "_button"}` | non-null | null
| null | null
| 1729794240241 | Field `event_type` is missing
|
+| `{"event_type": null, "event_ts": 1729794954163}`
| null | non-null | `00` (field exists,
is null) | null | null
| 1729794954163 | Field `event_type` is present and is
null |
+| `{"event_type": "noop", "event_ts": "2024-10-24"}`
| null | non-null | null
| `noop` | `"2024-10-24"`
| null | Field `event_ts` is present but not a
timestamp |
+| `{ }`
| null | non-null | null
| null | null
| null | Object is present but empty
|
+| null
| `00` (null) | null |
| |
| | Object/value is null
|
+| missing
| null | null |
| |
| | Object/value is missing
|
+| INVALID CASE: `{"event_type": "login", "event_ts": 1729795057774}`
| `{"event_type": "login"}` | non-null | null
| `login` | null
| 1729795057774 | INVALID CASE: Shredded field is present
in `value` |
+| INVALID CASE: `{"event_type": "login"}`
| `{"event_type": "login"}` | null |
| |
| | INVALID CASE: Shredded field is present
in `value`, while `typed_value` is null |
+| INVALID CASE: `"a"`
| `"a"` | non-null | null
| null | null
| null | INVALID CASE: `typed_value` is present
for non-object while being null, even though `value` is not an object |
+| INVALID CASE: `{}`
| `02 00` (object with 0 fields) | null |
| |
| | INVALID CASE: `typed_value` is null for
object |
Review Comment:
The table is already super wide; I'm not sure ` CASE` adds enough
information to be worth the space it consumes?
##########
VariantShredding.md:
##########
@@ -202,21 +202,22 @@ As a result, reads when both `value` and `typed_value`
are defined may be incons
The table below shows how the series of objects in the first column would be
stored:
-| Event object
| `value` | `typed_value` |
`typed_value.event_type.value` | `typed_value.event_type.typed_value` |
`typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes
|
-|------------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------|
-| `{"event_type": "noop", "event_ts": 1729794114937}`
| null | non-null | null
| `noop` | null
| 1729794114937 | Fully shredded object
|
-| `{"event_type": "login", "event_ts": 1729794146402, "email":
"[email protected]"}` | `{"email": "[email protected]"}` | non-null |
null | `login` | null
| 1729794146402 | Partially shredded
object |
-| `{"error_msg": "malformed: ..."}`
| `{"error_msg", "malformed: ..."}` | non-null | null
| null | null
| null | Object with all shredded fields missing
|
-| `"malformed: not an object"`
| `malformed: not an object` | null |
| |
| | Not an object (stored as Variant
string) |
-| `{"event_ts": 1729794240241, "click": "_button"}`
| `{"click": "_button"}` | non-null | null
| null | null
| 1729794240241 | Field `event_type` is missing
|
-| `{"event_type": null, "event_ts": 1729794954163}`
| null | non-null | `00` (field exists,
is null) | null | null
| 1729794954163 | Field `event_type` is present and is
null |
-| `{"event_type": "noop", "event_ts": "2024-10-24"}`
| null | non-null | null
| `noop` | `"2024-10-24"`
| null | Field `event_ts` is present but not a
timestamp |
-| `{ }`
| null | non-null | null
| null | null
| null | Object is present but empty
|
-| null
| `00` (null) | null |
| |
| | Object/value is null
|
-| missing
| null | null |
| |
| | Object/value is missing
|
-| INVALID
| `{"event_type": "login"}` | non-null | null
| `login` | null
| 1729795057774 | INVALID: Shredded field is present in
`value` |
-| INVALID
| `"a"` | non-null | null
| null | null
| null | INVALID: `typed_value` is present for
non-object |
-| INVALID
| `02 00` (object with 0 fields) | null |
| |
| | INVALID: `typed_value` is null for
object |
+| Event object
| `value` | `typed_value` |
`typed_value.event_type.value` | `typed_value.event_type.typed_value` |
`typed_value.event_ts.value` | `typed_value.event_ts.typed_value` | Notes
|
+|-----------------------------------------------------------------------------------|-----------------------------------|---------------|--------------------------------|--------------------------------------|------------------------------|------------------------------------|--------------------------------------------------------------------------------------------------------------|
+| `{"event_type": "noop", "event_ts": 1729794114937}`
| null | non-null | null
| `noop` | null
| 1729794114937 | Fully shredded object
|
+| `{"event_type": "login", "event_ts": 1729794146402, "email":
"[email protected]"}` | `{"email": "[email protected]"}` | non-null | null
| `login` | null
| 1729794146402 | Partially shredded
object
|
+| `{"error_msg": "malformed: ..."}`
| `{"error_msg", "malformed: ..."}` | non-null | null
| null | null
| null | Object with all shredded fields missing
|
+| `"malformed: not an object"`
| `malformed: not an object` | null |
| |
| | Not an object (stored as Variant string)
|
+| `{"event_ts": 1729794240241, "click": "_button"}`
| `{"click": "_button"}` | non-null | null
| null | null
| 1729794240241 | Field `event_type` is missing
|
+| `{"event_type": null, "event_ts": 1729794954163}`
| null | non-null | `00` (field exists,
is null) | null | null
| 1729794954163 | Field `event_type` is present and is
null |
+| `{"event_type": "noop", "event_ts": "2024-10-24"}`
| null | non-null | null
| `noop` | `"2024-10-24"`
| null | Field `event_ts` is present but not a
timestamp |
+| `{ }`
| null | non-null | null
| null | null
| null | Object is present but empty
|
+| null
| `00` (null) | null |
| |
| | Object/value is null
|
+| missing
| null | null |
| |
| | Object/value is missing
|
+| INVALID CASE: `{"event_type": "login", "event_ts": 1729795057774}`
| `{"event_type": "login"}` | non-null | null
| `login` | null
| 1729795057774 | INVALID CASE: Shredded field is present
in `value` |
+| INVALID CASE: `{"event_type": "login"}`
| `{"event_type": "login"}` | null |
| |
| | INVALID CASE: Shredded field is present
in `value`, while `typed_value` is null |
+| INVALID CASE: `"a"`
| `"a"` | non-null | null
| null | null
| null | INVALID CASE: `typed_value` is present
for non-object while being null, even though `value` is not an object |
Review Comment:
Not understanding the "while being null" part? What about just:
> INVALID CASE: `typed_value` is present and `value` is not an object
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
