viirya commented on code in PR #1033:
URL: https://github.com/apache/datafusion-comet/pull/1033#discussion_r1813414655


##########
native/core/src/execution/operators/copy.rs:
##########
@@ -262,7 +262,13 @@ fn copy_or_unpack_array(array: &Arc<dyn Array>, mode: 
&CopyMode) -> Result<Array
     match array.data_type() {
         DataType::Dictionary(_, value_type) => {
             let options = CastOptions::default();
-            cast_with_options(array, value_type.as_ref(), &options)
+            // We need to copy the array after `cast` because arrow-rs `take` 
kernel which is used
+            // to unpack dictionary array might reuse the input array's null 
buffer.
+            Ok(copy_array(&cast_with_options(

Review Comment:
   If https://github.com/apache/arrow-rs/pull/6616 can merge, we don't need 
this extra copy.
   
   This will do extra copy on data buffer too, although the issue happens on 
null buffer only.
   
   Actually we can have custom unpack dictionary and `take` implementation to 
only copy null buffer, but I put this simply for now with the expectation that 
the fix at arrow-rs could be merged eventually.
   
   If the arrow-rs patch is blocked on merging eventually, I will open another 
PR to have custom unpack dictionary and take implementation to reduce the copy 
cost here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to