alamb commented on code in PR #6180:
URL: https://github.com/apache/arrow-rs/pull/6180#discussion_r1705525874


##########
arrow-cast/src/cast/mod.rs:
##########
@@ -1282,6 +1290,15 @@ pub fn cast_with_options(
             Date64 => parse_string_view::<Date64Type>(array, cast_options),
             Binary => cast_view_to_byte::<StringViewType, 
GenericBinaryType<i32>>(array),
             LargeBinary => cast_view_to_byte::<StringViewType, 
GenericBinaryType<i64>>(array),
+            BinaryView => {
+                if let Some(arr) = 
array.as_any().downcast_ref::<StringViewArray>() {
+                    Ok(Arc::new(arr.clone().to_binary_view()))
+                } else {
+                    Err(ArrowError::CastError(
+                        "Cannot cast StringView to BinaryView".to_string(),
+                    ))
+                }
+            }

Review Comment:
   I think it would be more consistent here to use `as_string_view` rather than 
downcast_ref
   
   ```suggestion
               BinaryView => 
Ok(Arc::new(array.as_string_view().clone().to_binary_view())),
   ```



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -1413,10 +1437,34 @@ pub fn cast_with_options(
                 "Casting from {from_type:?} to {to_type:?} not supported",
             ))),
         },
-        (BinaryView, Binary) => cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i32>>(array),
-        (BinaryView, LargeBinary) => {
-            cast_view_to_byte::<BinaryViewType, GenericBinaryType<i64>>(array)
-        }
+        (BinaryView, _) => match to_type {
+            Binary => cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i32>>(array),
+            LargeBinary => cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i64>>(array),
+            Utf8 => {
+                let binary_arr =
+                    cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i32>>(array)?;
+                cast_binary_to_string::<i32>(&binary_arr, cast_options)

Review Comment:
   I double checked this and I think it does the right (fast) thing which is to 
do utf8 validation in one call



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -1413,10 +1437,34 @@ pub fn cast_with_options(
                 "Casting from {from_type:?} to {to_type:?} not supported",
             ))),
         },
-        (BinaryView, Binary) => cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i32>>(array),
-        (BinaryView, LargeBinary) => {
-            cast_view_to_byte::<BinaryViewType, GenericBinaryType<i64>>(array)
-        }
+        (BinaryView, _) => match to_type {
+            Binary => cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i32>>(array),
+            LargeBinary => cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i64>>(array),
+            Utf8 => {
+                let binary_arr =
+                    cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i32>>(array)?;
+                cast_binary_to_string::<i32>(&binary_arr, cast_options)
+            }
+            LargeUtf8 => {
+                let binary_arr =
+                    cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i64>>(array)?;
+                cast_binary_to_string::<i64>(&binary_arr, cast_options)
+            }
+            Utf8View => {
+                if let Some(arr) = 
array.as_any().downcast_ref::<BinaryViewArray>() {
+                    arr.clone()
+                        .to_string_view()
+                        .map(|x| Arc::new(x) as ArrayRef)
+                } else {
+                    Err(ArrowError::CastError(
+                        "Cannot cast BinaryView to StringView".to_string(),
+                    ))
+                }
+            }

Review Comment:
   As above, I think this would be more consistent if it used `as_binary_view`:
   
   
   
   ```suggestion
               Utf8View => 
Ok(Arc::new(array.as_binary_view().clone().to_string_view()?) as ArrayRef),
   ```



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -5270,6 +5311,48 @@ mod tests {
         Some("repeated"),
     ];
 
+    #[test]

Review Comment:
   Stylistically, I don't understand the need for `test_between_views` -- why 
not make `_test_string_view_to_binary_view` and 
`_test_binary_view_to_string_view`, etc their own tests?



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -1339,6 +1356,13 @@ pub fn cast_with_options(
                 array.as_string::<i64>().clone(),
             ))),
             Utf8View => 
Ok(Arc::new(StringViewArray::from(array.as_string::<i64>()))),
+            BinaryView => Ok(Arc::new(BinaryViewArray::from(

Review Comment:
   Same comment as above -- I think converting to `StringViewArray` and then 
calling `to_binary_view()` would be faster here



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -1413,10 +1437,34 @@ pub fn cast_with_options(
                 "Casting from {from_type:?} to {to_type:?} not supported",
             ))),
         },
-        (BinaryView, Binary) => cast_view_to_byte::<BinaryViewType, 
GenericBinaryType<i32>>(array),
-        (BinaryView, LargeBinary) => {
-            cast_view_to_byte::<BinaryViewType, GenericBinaryType<i64>>(array)
-        }
+        (BinaryView, _) => match to_type {

Review Comment:
   I wonder why you chose to add a second match statement rather than just some 
more direct matches
   
   ```rust
   (BinaryView, Utf8) => ...
   ```
   
   Using direct matches would avoid having to have the catchall clause:
   
   ```rust
               _ => Err(ArrowError::CastError(format!(
                   "Casting from {from_type:?} to {to_type:?} not supported",
               ))),
   ```
   



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -1229,6 +1230,13 @@ pub fn cast_with_options(
                 cast_byte_container::<BinaryType, LargeBinaryType>(&binary)
             }
             Utf8View => 
Ok(Arc::new(StringViewArray::from(array.as_string::<i32>()))),
+            BinaryView => Ok(Arc::new(BinaryViewArray::from(
+                array
+                    .as_string::<i32>()
+                    .into_iter()
+                    .map(|x| x.map(|x| x.as_bytes()))
+                    .collect::<Vec<_>>(),
+            ))),

Review Comment:
   I think this code does a string-by-string copy 
   
   I played around and I think the copy can be avoided by converting to 
StringViewArray which can reuse the buffer and then dropping the knowledge 
about utf8
   
   ```suggestion
               BinaryView => 
Ok(Arc::new(StringViewArray::from(array.as_string::<i32>()).to_binary_view())),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to