findepi commented on issue #8051: URL: https://github.com/apache/datafusion/issues/8051#issuecomment-2348333254
I am not using ballista currently. I realized the plan serialization concern is easy to address if we separate simplify into phases: the Expr-constraint simplify (e.g. pruning args known to be null, etc) that would be run during plan optimization phase. And then, local execution simplify which allows a function to "compile itself" into most optimal form, without any needs for serialization anymore. Ballista would need to serialize and distribute the plans in between these phases. BTW we focused so far on compiling regular expressions, but we didn't think about memory needs for their execution. Internally `regex::Regex::is_match` uses a synchronized pool of "caches" (regex execution scratch space) underneath. I don't know if this is a perf problem (probably not!), but let me use this as an example. It would probably be good if at runtime a scalar could have its own thread local "scratch space" / "local buffer". And without having to use thread locals which aren't great if DF is embedded and doesn't control thread creation. Why am I mentioning this? I thought that maybe if we had "scratch space" / "local buffer" support, we wouldn't have need to "compile functions" during planning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
