alamb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-2703726306
> So it sounds like the main complexity is going to be the TableProvider having to take ownership of applying the expression. > Given these are DataFusion scalar expressions (rather than any relational algebra), can the implementation not just invoke the expression as a fallback? I agree with @gatesn -- I don't think adding new columns based on expressions has to be all that complex (you can already do it via a [SchemaAdapter](https://docs.rs/datafusion/latest/datafusion/datasource/schema_adapter/trait.SchemaAdapter.html#tymethod.map_schema) / [SchemaMaper](https://docs.rs/datafusion/latest/datafusion/datasource/schema_adapter/trait.SchemaMapper.html)) (there is a similar usecase for filling in new columns with default values rather than NULL) > With Vortex, we've gone one step further and the scan accepts projection: Expr, filter: Option<Expr> where projection can arbitrarily select columns and apply scalar expressions to them. While this may be a little too general for DataFusion, it works well provided the system has good support for struct types and expressions for manipulating them. The normal DataFusion filter pushdown API allows table providers to report which expressions they can handle, which means most providers can ignore filters unless they have code to handle it. > However this does have the downside of pushing disproportionate complexity onto the TableProvider for the simple case of projecting out a few columns. Again, I think simple TableProviders can use something like SchemaAdapter for this usecase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org