andygrove commented on PR #4591: URL: https://github.com/apache/datafusion-comet/pull/4591#issuecomment-4855232345
I reviewed the code from a performance angle and there are a few opportunities, but nothing that should block landing the functional version. I'd suggest we merge this first and do the optimizations as follow-ups. I filed #4781 to track them: - Read path: hoist `UnsafeProjection.create` out of the per-batch loop, look at reducing the per-scan deep copy driven by `arrow_ffi_safe = false`, and consider an optional uncompressed cache format. - Write path: specialize `computeStats` per column to stop boxing every value, and copy string bounds only on update rather than copying every value. The trivial one is the `UnsafeProjection` hoist. The rest can come later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
