tustvold commented on code in PR #6752:
URL: https://github.com/apache/arrow-rs/pull/6752#discussion_r1851067294


##########
arrow-cast/src/cast/mod.rs:
##########
@@ -759,6 +763,12 @@ pub fn cast_with_options(
                 "Casting from type {from_type:?} to dictionary type 
{to_type:?} not supported",
             ))),
         },
+        (RunEndEncoded(re_t, _dt), _) => match re_t.data_type() {

Review Comment:
   > this is also the case in Parquet's dictionary run length encoding
   
   Parquet is run length, this is run end, and consequently the primitive type 
constrains the maximum length of the array, as opposed to the maximum length of 
a run.
   
   I'm sorry that we can't just add more codegen, but it has been a perennial 
issue for us that the full arrow specification is a combinatorial explosion, 
and so we have to pick some compute-optimised versions, and accept that 
space-optimised variants may incur a slight performance penalty. We should 
probably do a better job documenting this
   
   > code size.
   
   It's actually build time that is the biggest pain point, there was a time 
when half the build time of the entire workspace was just the dictionary 
comparison kernels 😅



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to