scovich commented on code in PR #8122: URL: https://github.com/apache/arrow-rs/pull/8122#discussion_r2274319800
########## parquet-variant-compute/src/variant_array.rs: ########## @@ -229,107 +328,138 @@ pub enum ShreddingState { // TODO: add missing state where there is neither value nor typed_value // Missing { metadata: BinaryViewArray }, /// This variant has no typed_value field - Unshredded { - metadata: BinaryViewArray, - value: BinaryViewArray, - }, + Unshredded { value: BinaryViewArray }, /// This variant has a typed_value field and no value field /// meaning it is the shredded type - Typed { - metadata: BinaryViewArray, - typed_value: ArrayRef, - }, - /// Partially shredded: - /// * value is an object - /// * typed_value is a shredded object. + PerfectlyShredded { typed_value: ArrayRef }, Review Comment: > My understanding was that if `typed_value` was non null, it contains the value and an implementation doesn't even have to check the `value` field For primitive values and arrays, yes -- `(NULL, NULL)` is an invalid case and `(Variant::Null, NULL)` translates to `Variant::Null` (which casts to `NULL` of any other type). Objects have a multi-step process to access field `f`: 1. Is the `typed_value` NULL? * Yes => Not even an object => have to use `value` (or just fail, since nothing casts to struct) * It is illegal for `value` to also be NULL, because this is the top-level * No => [partially] shredded object => continue (ignore `value`) 2. Does the column `typed_value.f` exist? * No => `f` was not in the shredding schema => have to check `value` which may or may not be a variant object * Yes => `f` was in the shredding schema => continue 3. Is `typed_value.f.typed_value` NULL? * Yes => This row does not contain `f`, or `f` is the wrong type => fall back to `typed_value.f.value` * If that is also NULL then `f` is NULL; otherwise, `f` could be any variant object (including `Variant::Null`) * No => This row contains `f` and it is shredded (ignore `typed_value.f.value`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org