viirya commented on code in PR #1033:
URL: https://github.com/apache/datafusion-comet/pull/1033#discussion_r1813414655


##########
native/core/src/execution/operators/copy.rs:
##########
@@ -262,7 +262,13 @@ fn copy_or_unpack_array(array: &Arc<dyn Array>, mode: 
&CopyMode) -> Result<Array
     match array.data_type() {
         DataType::Dictionary(_, value_type) => {
             let options = CastOptions::default();
-            cast_with_options(array, value_type.as_ref(), &options)
+            // We need to copy the array after `cast` because arrow-rs `take` 
kernel which is used
+            // to unpack dictionary array might reuse the input array's null 
buffer.
+            Ok(copy_array(&cast_with_options(

Review Comment:
   If https://github.com/apache/arrow-rs/pull/6616 can merge, we don't need 
this extra copy.
   
   This will do extra copy on data buffer too, although the issue happens on 
null buffer only.
   
   Actually we can have custom unpack dictionary and take implementation to 
only copy null buffer, but I put this simply with the expectation that the fix 
at arrow-rs could be merged eventually.
   
   If it is blocked on merging eventually, I will open another PR to have 
custom unpack dictionary and take implementation to reduce the copy cost here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to