adriangb opened a new issue, #14993: URL: https://github.com/apache/datafusion/issues/14993
For the scenario of `select expensive_thing(col1) from data` I would like to pre-process (speed up) `expensive_thing(col1)`. The easiest way I can think of doing this is by pre-computing the expression and saving it as a column with a specific name like `_expensive_thing__col1`. But then I have to hardcode this column into my schema, the hardcoded expressions need to be the same for every file, etc. To me an ideal solution would be to push down the expression to the file reading level so that I can then check "does `_some_other_expensive_expr__col38` exist in the file? if so read that, otherwise read `col38` and compute the expression". The tricky thing is I'd want to do this on a per-file level: depending on the data different expression/column combinations would be pre-computed; it's prohibitive to put them all in the schema that is shared amongst all files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org