viirya commented on code in PR #1033: URL: https://github.com/apache/datafusion-comet/pull/1033#discussion_r1813414655
########## native/core/src/execution/operators/copy.rs: ########## @@ -262,7 +262,13 @@ fn copy_or_unpack_array(array: &Arc<dyn Array>, mode: &CopyMode) -> Result<Array match array.data_type() { DataType::Dictionary(_, value_type) => { let options = CastOptions::default(); - cast_with_options(array, value_type.as_ref(), &options) + // We need to copy the array after `cast` because arrow-rs `take` kernel which is used + // to unpack dictionary array might reuse the input array's null buffer. + Ok(copy_array(&cast_with_options( Review Comment: If https://github.com/apache/arrow-rs/pull/6616 can merge, we don't need this extra copy. This will do extra copy on data buffer too, although the issue happens on null buffer only. Actually we can have custom unpack dictionary and take implementation to only copy null buffer, but I put this simply with the expectation that the fix at arrow-rs could be merged eventually. If it is blocked on merging eventually, I will open another PR to have custom unpack dictionary and take implementation to reduce the copy cost here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org