adriangb opened a new issue, #14993:
URL: https://github.com/apache/datafusion/issues/14993

   For the scenario of `select expensive_thing(col1) from data` I would like to 
pre-process (speed up) `expensive_thing(col1)`.
   The easiest way I can think of doing this is by pre-computing the expression 
and saving it as a column with a specific name like `_expensive_thing__col1`.
   But then I have to hardcode this column into my schema, the hardcoded 
expressions need to be the same for every file, etc.
   
   To me an ideal solution would be to push down the expression to the file 
reading level so that I can then check "does 
`_some_other_expensive_expr__col38` exist in the file? if so read that, 
otherwise read `col38` and compute the expression".
   
   The tricky thing is I'd want to do this on a per-file level: depending on 
the data different expression/column combinations would be pre-computed; it's 
prohibitive to put them all in the schema that is shared amongst all files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to