Re: [PR] Adds partial `cast` support for run-end encoded arrays [arrow-rs]

via GitHub Wed, 20 Nov 2024 13:51:12 -0800


RyanMarcus commented on code in PR #6752:
URL: https://github.com/apache/arrow-rs/pull/6752#discussion_r1851025127



##########
arrow-cast/src/cast/mod.rs:
##########
@@ -759,6 +763,12 @@ pub fn cast_with_options(
                 "Casting from type {from_type:?} to dictionary type 
{to_type:?} not supported",
             ))),
         },
+        (RunEndEncoded(re_t, _dt), _) => match re_t.data_type() {

Review Comment:
   Just to be clear, the standard is only `Int16`, `Int32`, and `Int64`, no 
`Int8` (which is reflected in the code). I think `Int16` / the smaller integer 
types are actually the most common (this is also the case in Parquet's 
dictionary run length encoding).
   
   I didn't realize that even this amount of extra codegen was an issue. If I 
am literally the only user of this feature, we simply shouldn't do this at all. 
Honestly, this won't even solve my problem, since a 30%-2x slowdown on some 
types means I will just have to implement the full thing in my own code anyway. 
Luckily in my project, we are significantly less constrained by code size.
   
   Probably the "right thing" to do is somehow split up cast into pieces, so 
people can opt into what they need, either with feature flags or with more 
subcrates, but I think that's a pretty large refactor.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Adds partial `cast` support for run-end encoded arrays [arrow-rs]

Reply via email to