viirya commented on PR #8991: URL: https://github.com/apache/arrow-datafusion/pull/8991#issuecomment-1915602228
> wouldn't it actually make more sense to compute the expressions prior to the networked shuffle so only 2 columns of data (`lcol_1 + lcol_2` and `rcol_1 + rcol_2`) need to be sent, rather than the 4 original columns 🤔 Hmm, except for joining keys, I think you still can list other columns (e.g., the original 4 columns) into selection list? So they are not always able to be removed from shuffle, I think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
