Jefffrey commented on code in PR #9768:
URL: https://github.com/apache/arrow-rs/pull/9768#discussion_r3115329684
##########
arrow-cast/src/cast/dictionary.rs:
##########
@@ -47,6 +44,35 @@ pub(crate) fn dictionary_cast<K: ArrowDictionaryKeyType>(
array.keys(),
array.values().as_binary::<i32>(),
),
+ // LargeUtf8/LargeBinary -> View: fast path only when i64 offsets fit
in u32 (buffer < 4GiB).
+ // If the buffer is too large, fall back to the general path.
+ (LargeUtf8, Utf8View) => {
+ let values = array.values().as_string::<i64>();
+ if values.values().len() < u32::MAX as usize {
+ view_from_dict_values::<K, LargeUtf8Type,
StringViewType>(array.keys(), values)
+ } else {
+ unpack_dictionary(array, to_type, cast_options)
+ }
+ }
+ (LargeBinary, BinaryView) => {
+ let values = array.values().as_binary::<i64>();
+ if values.values().len() < u32::MAX as usize {
+ view_from_dict_values::<K, LargeBinaryType,
BinaryViewType>(array.keys(), values)
+ } else {
+ unpack_dictionary(array, to_type, cast_options)
+ }
+ }
+ // Cross casts: Utf8 -> BinaryView is always zero-copy safe (valid
UTF-8 is valid binary).
+ (Utf8, BinaryView) => view_from_dict_values::<K, Utf8Type,
BinaryViewType>(
+ array.keys(),
+ array.values().as_string::<i32>(),
+ ),
+ // Cross cast: Binary -> Utf8View requires UTF-8 validation of the
dictionary values.
+ (Binary, Utf8View) => binary_dict_to_string_view::<K>(
Review Comment:
I feel this arm specifically should be benchmarked as it introduces new
logic compared to the other arms
##########
arrow-cast/src/cast/dictionary.rs:
##########
@@ -47,6 +44,35 @@ pub(crate) fn dictionary_cast<K: ArrowDictionaryKeyType>(
array.keys(),
array.values().as_binary::<i32>(),
),
+ // LargeUtf8/LargeBinary -> View: fast path only when i64 offsets fit
in u32 (buffer < 4GiB).
+ // If the buffer is too large, fall back to the general path.
+ (LargeUtf8, Utf8View) => {
+ let values = array.values().as_string::<i64>();
+ if values.values().len() < u32::MAX as usize {
Review Comment:
This check reads a little odd to me as usually this could mean
`unpack_dictionary` may also fail if offsets don't fit?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]