[I] Push down expression to files [datafusion]

via GitHub Mon, 03 Mar 2025 14:37:55 -0800


adriangb opened a new issue, #14993:
URL: https://github.com/apache/datafusion/issues/14993


   For the scenario of `select expensive_thing(col1) from data` I would like to 
pre-process (speed up) `expensive_thing(col1)`.
   The easiest way I can think of doing this is by pre-computing the expression 
and saving it as a column with a specific name like `_expensive_thing__col1`.
   But then I have to hardcode this column into my schema, the hardcoded 
expressions need to be the same for every file, etc.
   
   To me an ideal solution would be to push down the expression to the file 
reading level so that I can then check "does 
`_some_other_expensive_expr__col38` exist in the file? if so read that, 
otherwise read `col38` and compute the expression".
   
   The tricky thing is I'd want to do this on a per-file level: depending on 
the data different expression/column combinations would be pre-computed; it's 
prohibitive to put them all in the schema that is shared amongst all files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Push down expression to files [datafusion]

Reply via email to