scovich commented on code in PR #9599:
URL: https://github.com/apache/arrow-rs/pull/9599#discussion_r2984085062
##########
parquet-variant-compute/src/variant_to_arrow.rs:
##########
@@ -941,7 +942,14 @@ where
match value {
Variant::List(list) => {
for element in list.iter() {
- self.element_builder.append_value(element)?;
+ match element {
Review Comment:
First: That example sounds wrong? Shouldn't a variant_get that hits
`Variant::Null` _always_ return NULL?
`Variant::Null` coerces to a NULL of any type, no? (unlike, say encountering
`Variant::Float` when `List<Int64>` was requested). Maybe this is a special
case symptom of https://github.com/apache/arrow-rs/issues/9606?
Second: But anyway, this specific code is unpacking list _elements_ not the
list itself?
What specific variant value and json path did you hit problems for?
Third: This code that unpacks list elements used to just
`append_value(value)` (without ever knowing `value = Variant::Null`), which
would produce a non-NULL value column entry containing `Variant::Null`. The new
code specifically checks for `Variant::Null` and calls `append_null()`. But the
element builder was initialized in `ArrayElement` mode, so `append_null` does
the exact same thing -- produces a non-NULL value column entry containing
`Variant::Null`. Based on that, I'm struggling to see how this change would
alter behavior at all?
##########
parquet-variant-compute/src/variant_to_arrow.rs:
##########
@@ -941,7 +942,14 @@ where
match value {
Variant::List(list) => {
for element in list.iter() {
- self.element_builder.append_value(element)?;
+ match element {
Review Comment:
Oh wait... this is going _from_ shredded variant _to_ some kind of
`ListArray`, in response to `variant_get(..., List<Int64>)`. That operation
shouldn't produce shredded variant as output, so we shouldn't need to constrict
it with a `NullValue` in the first place?? It should produce an actual list.
If we want to extract a shredded list as output, we'd need to pass
`variant_get(..., List<Struct>)` where the list is annotated as variant
extension type and the struct has `value` and `typed_value` fields of the
appropriate types?
##########
parquet-variant-compute/src/variant_get.rs:
##########
@@ -4184,6 +4184,59 @@ mod test {
}
}
+ #[test]
+ fn test_variant_get_list_like_unsafe_cast_preserves_null_elements() {
+ let string_array: ArrayRef = Arc::new(StringArray::from(vec![r#"[1,
null, 3]"#]));
+ let variant_array =
ArrayRef::from(json_to_variant(&string_array).unwrap());
+ let cast_options = CastOptions {
+ safe: false,
+ ..Default::default()
+ };
+ let options = GetOptions::new()
+ .with_as_type(Some(FieldRef::from(Field::new(
+ "result",
+ DataType::List(Arc::new(Field::new("item", DataType::Int64,
true))),
+ true,
+ ))))
+ .with_cast_options(cast_options);
+
+ let result = variant_get(&variant_array, options).unwrap();
+ let element_struct = result
+ .as_any()
+ .downcast_ref::<ListArray>()
+ .unwrap()
+ .values()
+ .as_any()
+ .downcast_ref::<StructArray>()
+ .unwrap();
+
+ let value = element_struct
+ .column_by_name("value")
+ .unwrap()
+ .as_any()
+ .downcast_ref::<BinaryViewArray>()
+ .unwrap();
+ let typed_value = element_struct
+ .column_by_name("typed_value")
+ .unwrap()
+ .as_any()
+ .downcast_ref::<Int64Array>()
+ .unwrap();
+
+ assert_eq!(typed_value.len(), 3);
+ assert_eq!(typed_value.value(0), 1);
Review Comment:
assert that typed_value is_valid for rows 0 and 2?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]