alamb commented on code in PR #5703:
URL: https://github.com/apache/arrow-rs/pull/5703#discussion_r1584760289


##########
arrow-cast/src/cast/mod.rs:
##########
@@ -7100,6 +7116,54 @@ mod tests {
         FixedSizeListArray::from(list_data)
     }
 
+    #[test]
+    fn test_cast_map_containers() {
+        let keys = vec!["0", "1", "5", "6", "7"];
+        let values: Vec<&[u8]> = vec![b"test_val_1", b"test_val_2", 
b"long_test_val_3", b"4", b"test_val_5"];
+        let entry_offsets = vec![0u32, 1, 1, 4, 5, 5];
+
+        let array = MapArray::new_from_strings(
+            keys.into_iter(),
+            &BinaryArray::from_vec(values),
+            &entry_offsets,
+        ).unwrap();
+
+        let new_type = DataType::Map(Arc::new(Field::new("entries_new", 
DataType::Struct(

Review Comment:
   Would it be possible to add an assert here on what the type of `array` was 
(to show it wasn't the same as `new-type`)?



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -7100,6 +7116,54 @@ mod tests {
         FixedSizeListArray::from(list_data)
     }
 
+    #[test]
+    fn test_cast_map_containers() {
+        let keys = vec!["0", "1", "5", "6", "7"];

Review Comment:
   Can you please add test coverage for `None` (aka when the keys / values also 
have null values)? 
   
   I think it might also make this test clearer if you used `MapBuilder` to 
construct the inputs: 
https://docs.rs/arrow/latest/arrow/array/builder/struct.MapBuilder.html
   
   



##########
arrow-cast/src/cast/map.rs:
##########
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::cast::*;
+
+
+/// Helper function that takes a map container and casts the inner datatype.
+pub(crate) fn cast_map_values(
+    from: &MapArray,
+    to_data_type: &DataType,
+    cast_options: &CastOptions,
+    to_ordered: bool,
+) -> Result<ArrayRef, ArrowError> {
+    let key_array = cast_with_options(from.keys(), 
&to_data_type.key_type().ok_or(ArrowError::CastError("map is missing key 
type".to_string()))?, cast_options)?;
+    let value_array = cast_with_options(from.values(), 
&to_data_type.value_type().ok_or(ArrowError::CastError("map is missing value 
type".to_string()))?, cast_options)?;
+
+    let to_fields = Fields::from(vec![
+        to_data_type.key_field().ok_or(ArrowError::CastError("map is missing 
key field".to_string()))?,
+        to_data_type.value_field().ok_or(ArrowError::CastError("map is missing 
value field".to_string()))?,
+    ]);
+
+    let entries_field = if let DataType::Map(entries_field, _) = to_data_type {
+        Some(entries_field)
+    } else {
+        None
+    }.ok_or(ArrowError::CastError("to_data_type is not a map 
type.".to_string()))?;

Review Comment:
   Since this can only be called internally, it seems like this should be 
'impossible' to hit -- maybe we could add an 'interna' or something to make it 
clear it is not expected to happen
   
   Perhaps something like 
   ```suggestion
       let entries_field = if let DataType::Map(entries_field, _) = 
to_data_type {
           Some(entries_field)
       } else {
          return Err(ArrowError::CastError("Internal Error: to_data_type is not 
a map type.".to_string())))
       };
   ```



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -157,6 +159,12 @@ pub fn can_cast_types(from_type: &DataType, to_type: 
&DataType) -> bool {
         (_, LargeList(list_to)) => can_cast_types(from_type, 
list_to.data_type()),
         (_, FixedSizeList(list_to,size)) if *size == 1 => {
             can_cast_types(from_type, list_to.data_type())},
+        (Map(_,ordered_from), Map(_, ordered_to)) if ordered_from == 
ordered_to =>

Review Comment:
   See my comment below -- I think we could avoid key_type/value_type here



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -710,6 +718,14 @@ pub fn cast_with_options(
         (_, FixedSizeList(ref to, size)) if *size == 1 => {
             cast_values_to_fixed_size_list(array, to, *size, cast_options)
         }
+        (Map(_, ordered1), Map(_, ordered2)) if ordered1 == ordered2 =>
+            cast_map_values(
+                array.as_map_opt()

Review Comment:
   I think this could simply use `as_map()` given this is in the `match` clause 
when data type was a map. As in there is no reason to return this error as it 
is not expected and panic'ing would be consistent with the rest of the codebase 
if the data type didn't match the array



##########
arrow-schema/src/datatype.rs:
##########
@@ -670,6 +670,42 @@ impl DataType {
     pub fn new_fixed_size_list(data_type: DataType, size: i32, nullable: bool) 
-> Self {
         DataType::FixedSizeList(Arc::new(Field::new_list_field(data_type, 
nullable)), size)
     }
+
+    /// Gets the key field in a map data type.  For all other types returns 
None.
+    pub fn key_field(&self) -> Option<FieldRef> {

Review Comment:
   Since you are already matching on `DataType::Map` above you'll already have 
`entries_field` available -- so we could avoid adding this new API to Datatype



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -7100,6 +7116,54 @@ mod tests {
         FixedSizeListArray::from(list_data)
     }
 
+    #[test]
+    fn test_cast_map_containers() {
+        let keys = vec!["0", "1", "5", "6", "7"];
+        let values: Vec<&[u8]> = vec![b"test_val_1", b"test_val_2", 
b"long_test_val_3", b"4", b"test_val_5"];
+        let entry_offsets = vec![0u32, 1, 1, 4, 5, 5];
+
+        let array = MapArray::new_from_strings(
+            keys.into_iter(),
+            &BinaryArray::from_vec(values),
+            &entry_offsets,
+        ).unwrap();
+
+        let new_type = DataType::Map(Arc::new(Field::new("entries_new", 
DataType::Struct(
+            vec![
+                Field::new("key_new", DataType::Utf8, false),
+                Field::new("value_values", DataType::Binary, false)
+            ].into()
+        ), false)), false);
+
+        let array = cast(
+            &array,
+            &new_type.clone()).unwrap();
+        let map_array = array.as_map();
+
+        assert_eq!(new_type, map_array.data_type().clone());
+
+        let key_string = map_array
+            .keys()
+            .as_any()
+            .downcast_ref::<StringArray>()
+            .unwrap()
+            .into_iter()
+            .flatten()
+            .collect::<Vec<_>>();
+        assert_eq!(&key_string, &vec!["0", "1", "5", "6", "7"]);
+
+        let values_string_array = cast(map_array
+                                           .values(), 
&DataType::Utf8).unwrap();
+        let values_string = values_string_array
+            .as_any()
+            .downcast_ref::<StringArray>()
+            .unwrap()
+            .into_iter()
+            .flatten()
+            .collect::<Vec<_>>();
+        assert_eq!(&values_string, &vec!["test_val_1", "test_val_2", 
"long_test_val_3", "4", "test_val_5"]);
+    }
+
     #[test]

Review Comment:
   Can you also please add a negative tests for:
   1. not being able to cast two `MapArray`s with different values for 
`ordered`  ?
   2. Two MapArrays where the values can not be cast (maybe where the values 
are `Interval`s and trying to cast to another map array where the values are 
`Duration`s)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to