sdf-jkl commented on code in PR #9598:
URL: https://github.com/apache/arrow-rs/pull/9598#discussion_r3048591246
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -86,15 +88,14 @@ pub(crate) fn follow_shredded_path_element<'a>(
return Ok(missing_path_step());
};
- let struct_array = field.as_struct_opt().ok_or_else(|| {
- // TODO: Should we blow up? Or just end the traversal and let
the normal
- // variant pathing code sort out the mess that it must anyway
be
- // prepared to handle?
- ArrowError::InvalidArgumentError(format!(
- "Expected Struct array while following path, got {}",
- field.data_type(),
- ))
- })?;
+ // The field might be a VariantArray (StructArray) if shredded,
+ // or it might be a primitive array. Only proceed if it's a
StructArray.
+ let Some(struct_array) = field.as_struct_opt() else {
+ // Field exists but is not a StructArray, so it cannot be
+ // followed further. Fall back to the value column if present,
+ // otherwise the path is missing.
Review Comment:
I agree with @codephage2020. This is malformed data rather than a
JSONPath-handling case.
<details>
<summary>Spark appears to treat this as <code>malformedVariant</code> as
well</summary>
What Codex spewed out for me:
**Path-extraction route (closest to `variant_get` path-step behavior):**
1. `ParquetRowConverter.end` calls `assembleVariantStruct(...)`
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L1036-L1043
2. `assembleVariantStruct` calls `extractField(...)`
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkShreddingUtils.scala#L827-L844
3. In `extractField`, a null object-field slot triggers malformed variant
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkShreddingUtils.scala#L795-L799
4. That throws `QueryExecutionErrors.malformedVariant()`
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala#L2995-L2998
**Rebuild route (directly via `common/variant/.../ShreddingUtils.java`):**
1. `ParquetRowConverter.end` calls `assembleVariant(...)`
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L1039-L1040
2. `assembleVariant` calls `ShreddingUtils.rebuild(...)`
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/SparkShreddingUtils.scala#L821-L823
3. `ShreddingUtils.rebuild` throws `malformedVariant()` on invalid shredded
layout
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/common/variant/src/main/java/org/apache/spark/types/variant/ShreddingUtils.java#L49-L63
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/common/variant/src/main/java/org/apache/spark/types/variant/ShreddingUtils.java#L131-L135
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/common/variant/src/main/java/org/apache/spark/types/variant/ShreddingUtils.java#L167-L170
4. `malformedVariant()` definition
https://github.com/apache/spark/blob/c8695495569e4058a758b111ebc942fc4906494c/common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java#L177-L180
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]