erratic-pattern commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2096703188
Hey I am interested in helping with this. Maybe @peter-toth and I can divide our efforts here? Let me know what you've worked on so far, and I can figure out how to help. One thing I see in particular that's not directly cloning the `LogicalPlan`s and `Expr`s, but may be putting pressure on the global allocator, is the `Identifiers` in the [ExprSet](https://github.com/apache/datafusion/blob/2bbfbdf6699907fd8ec094f5f5600af7fe13946b/datafusion/optimizer/src/common_subexpr_eliminate.rs#L47) which are represented as `String`s produced by `Display`ing the `Expr`s. Assuming that the new zero-copy implementation will continue using the `ExprSet`, maybe I could look into efficiently hashing subexpressions to produce numeric identifiers that are easier to copy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
