Jefffrey commented on code in PR #8912:
URL: https://github.com/apache/arrow-rs/pull/8912#discussion_r2553988287


##########
arrow-cast/src/cast/dictionary.rs:
##########
@@ -28,111 +28,92 @@ pub(crate) fn dictionary_cast<K: ArrowDictionaryKeyType>(
 ) -> Result<ArrayRef, ArrowError> {
     use DataType::*;
 
-    match to_type {
-        Dictionary(to_index_type, to_value_type) => {
-            let dict_array = array

Review Comment:
   I moved this inner logic into a separate function to be similar to how the 
other arms delegate to functions for the full logic.



##########
arrow-cast/src/cast/dictionary.rs:
##########
@@ -157,21 +138,17 @@ fn view_from_dict_values<K: ArrowDictionaryKeyType, T: 
ByteViewType, V: ByteArra
             }
         }
     }
-    Ok(builder.finish())
+    Ok(Arc::new(builder.finish()))
 }
 
-// Unpack a dictionary where the keys are of type <K> into a flattened array 
of type to_type
-pub(crate) fn unpack_dictionary<K>(
-    array: &dyn Array,
+// Unpack a dictionary into a flattened array of type to_type
+pub(crate) fn unpack_dictionary<K: ArrowDictionaryKeyType>(
+    array: &DictionaryArray<K>,

Review Comment:
   Moved the cast to dictionary array to the parent function 
(`dictionary_cast()`) so we don't need to always cast in each of the 
specialized functions.



##########
arrow-cast/src/cast/dictionary.rs:
##########
@@ -28,111 +28,92 @@ pub(crate) fn dictionary_cast<K: ArrowDictionaryKeyType>(
 ) -> Result<ArrayRef, ArrowError> {
     use DataType::*;
 
-    match to_type {
-        Dictionary(to_index_type, to_value_type) => {
-            let dict_array = array
-                .as_any()
-                .downcast_ref::<DictionaryArray<K>>()
-                .ok_or_else(|| {
-                    ArrowError::ComputeError(
-                        "Internal Error: Cannot cast dictionary to 
DictionaryArray of expected type".to_string(),
-                    )
-                })?;
+    let array = array.as_dictionary::<K>();
+    let from_child_type = array.values().data_type();
+    match (from_child_type, to_type) {
+        (_, Dictionary(to_index_type, to_value_type)) => {
+            dictionary_to_dictionary_cast(array, to_index_type, to_value_type, 
cast_options)
+        }
+        // `unpack_dictionary` can handle Utf8View/BinaryView types, but 
incurs unnecessary data
+        // copy of the value buffer. Fast path which avoids copying underlying 
values buffer.
+        // TODO: handle LargeUtf8/LargeBinary -> View (need to check offsets 
can fit)
+        // TODO: handle cross types (String -> BinaryView, Binary -> 
StringView)
+        //       (need to validate utf8?)
+        (Utf8, Utf8View) => view_from_dict_values::<K, Utf8Type, 
StringViewType>(
+            array.keys(),
+            array.values().as_string::<i32>(),
+        ),
+        (Binary, BinaryView) => view_from_dict_values::<K, BinaryType, 
BinaryViewType>(
+            array.keys(),
+            array.values().as_binary::<i32>(),
+        ),

Review Comment:
   This is the main change; we check the values beforehand to see if it is 
valid for the fast path, which was missing before (and the assumption lead to 
an error). I've intentionally kept the behaviour as intended (limited the fast 
path only for `Dictionary<Utf8>` -> `Utf8View`, `Dictionary<Binary>` -> 
`BinaryView`) since was mainly interested in just fixing the cast bug. Left 
comments for potentially extending this fast path to be valid for more 
combinations (can raise issues for this).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to