alamb commented on code in PR #5703:
URL: https://github.com/apache/arrow-rs/pull/5703#discussion_r1584760289
##########
arrow-cast/src/cast/mod.rs:
##########
@@ -7100,6 +7116,54 @@ mod tests {
FixedSizeListArray::from(list_data)
}
+ #[test]
+ fn test_cast_map_containers() {
+ let keys = vec!["0", "1", "5", "6", "7"];
+ let values: Vec<&[u8]> = vec![b"test_val_1", b"test_val_2",
b"long_test_val_3", b"4", b"test_val_5"];
+ let entry_offsets = vec![0u32, 1, 1, 4, 5, 5];
+
+ let array = MapArray::new_from_strings(
+ keys.into_iter(),
+ &BinaryArray::from_vec(values),
+ &entry_offsets,
+ ).unwrap();
+
+ let new_type = DataType::Map(Arc::new(Field::new("entries_new",
DataType::Struct(
Review Comment:
Would it be possible to add an assert here on what the type of `array` was
(to show it wasn't the same as `new-type`)?
##########
arrow-cast/src/cast/mod.rs:
##########
@@ -7100,6 +7116,54 @@ mod tests {
FixedSizeListArray::from(list_data)
}
+ #[test]
+ fn test_cast_map_containers() {
+ let keys = vec!["0", "1", "5", "6", "7"];
Review Comment:
Can you please add test coverage for `None` (aka when the keys / values also
have null values)?
I think it might also make this test clearer if you used `MapBuilder` to
construct the inputs:
https://docs.rs/arrow/latest/arrow/array/builder/struct.MapBuilder.html
##########
arrow-cast/src/cast/map.rs:
##########
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::cast::*;
+
+
+/// Helper function that takes a map container and casts the inner datatype.
+pub(crate) fn cast_map_values(
+ from: &MapArray,
+ to_data_type: &DataType,
+ cast_options: &CastOptions,
+ to_ordered: bool,
+) -> Result<ArrayRef, ArrowError> {
+ let key_array = cast_with_options(from.keys(),
&to_data_type.key_type().ok_or(ArrowError::CastError("map is missing key
type".to_string()))?, cast_options)?;
+ let value_array = cast_with_options(from.values(),
&to_data_type.value_type().ok_or(ArrowError::CastError("map is missing value
type".to_string()))?, cast_options)?;
+
+ let to_fields = Fields::from(vec![
+ to_data_type.key_field().ok_or(ArrowError::CastError("map is missing
key field".to_string()))?,
+ to_data_type.value_field().ok_or(ArrowError::CastError("map is missing
value field".to_string()))?,
+ ]);
+
+ let entries_field = if let DataType::Map(entries_field, _) = to_data_type {
+ Some(entries_field)
+ } else {
+ None
+ }.ok_or(ArrowError::CastError("to_data_type is not a map
type.".to_string()))?;
Review Comment:
Since this can only be called internally, it seems like this should be
'impossible' to hit -- maybe we could add an 'interna' or something to make it
clear it is not expected to happen
Perhaps something like
```suggestion
let entries_field = if let DataType::Map(entries_field, _) =
to_data_type {
Some(entries_field)
} else {
return Err(ArrowError::CastError("Internal Error: to_data_type is not
a map type.".to_string())))
};
```
##########
arrow-cast/src/cast/mod.rs:
##########
@@ -157,6 +159,12 @@ pub fn can_cast_types(from_type: &DataType, to_type:
&DataType) -> bool {
(_, LargeList(list_to)) => can_cast_types(from_type,
list_to.data_type()),
(_, FixedSizeList(list_to,size)) if *size == 1 => {
can_cast_types(from_type, list_to.data_type())},
+ (Map(_,ordered_from), Map(_, ordered_to)) if ordered_from ==
ordered_to =>
Review Comment:
See my comment below -- I think we could avoid key_type/value_type here
##########
arrow-cast/src/cast/mod.rs:
##########
@@ -710,6 +718,14 @@ pub fn cast_with_options(
(_, FixedSizeList(ref to, size)) if *size == 1 => {
cast_values_to_fixed_size_list(array, to, *size, cast_options)
}
+ (Map(_, ordered1), Map(_, ordered2)) if ordered1 == ordered2 =>
+ cast_map_values(
+ array.as_map_opt()
Review Comment:
I think this could simply use `as_map()` given this is in the `match` clause
when data type was a map. As in there is no reason to return this error as it
is not expected and panic'ing would be consistent with the rest of the codebase
if the data type didn't match the array
##########
arrow-schema/src/datatype.rs:
##########
@@ -670,6 +670,42 @@ impl DataType {
pub fn new_fixed_size_list(data_type: DataType, size: i32, nullable: bool)
-> Self {
DataType::FixedSizeList(Arc::new(Field::new_list_field(data_type,
nullable)), size)
}
+
+ /// Gets the key field in a map data type. For all other types returns
None.
+ pub fn key_field(&self) -> Option<FieldRef> {
Review Comment:
Since you are already matching on `DataType::Map` above you'll already have
`entries_field` available -- so we could avoid adding this new API to Datatype
##########
arrow-cast/src/cast/mod.rs:
##########
@@ -7100,6 +7116,54 @@ mod tests {
FixedSizeListArray::from(list_data)
}
+ #[test]
+ fn test_cast_map_containers() {
+ let keys = vec!["0", "1", "5", "6", "7"];
+ let values: Vec<&[u8]> = vec![b"test_val_1", b"test_val_2",
b"long_test_val_3", b"4", b"test_val_5"];
+ let entry_offsets = vec![0u32, 1, 1, 4, 5, 5];
+
+ let array = MapArray::new_from_strings(
+ keys.into_iter(),
+ &BinaryArray::from_vec(values),
+ &entry_offsets,
+ ).unwrap();
+
+ let new_type = DataType::Map(Arc::new(Field::new("entries_new",
DataType::Struct(
+ vec![
+ Field::new("key_new", DataType::Utf8, false),
+ Field::new("value_values", DataType::Binary, false)
+ ].into()
+ ), false)), false);
+
+ let array = cast(
+ &array,
+ &new_type.clone()).unwrap();
+ let map_array = array.as_map();
+
+ assert_eq!(new_type, map_array.data_type().clone());
+
+ let key_string = map_array
+ .keys()
+ .as_any()
+ .downcast_ref::<StringArray>()
+ .unwrap()
+ .into_iter()
+ .flatten()
+ .collect::<Vec<_>>();
+ assert_eq!(&key_string, &vec!["0", "1", "5", "6", "7"]);
+
+ let values_string_array = cast(map_array
+ .values(),
&DataType::Utf8).unwrap();
+ let values_string = values_string_array
+ .as_any()
+ .downcast_ref::<StringArray>()
+ .unwrap()
+ .into_iter()
+ .flatten()
+ .collect::<Vec<_>>();
+ assert_eq!(&values_string, &vec!["test_val_1", "test_val_2",
"long_test_val_3", "4", "test_val_5"]);
+ }
+
#[test]
Review Comment:
Can you also please add a negative tests for:
1. not being able to cast two `MapArray`s with different values for
`ordered` ?
2. Two MapArrays where the values can not be cast (maybe where the values
are `Interval`s and trying to cast to another map array where the values are
`Duration`s)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]