ethan-tyler commented on PR #20071: URL: https://github.com/apache/datafusion/pull/20071#issuecomment-3825022132
> I think it should be much simpler than this: create a UDF and pass it through `ProjectionExprs::transform_exprs(|expr| expr.transform(|expr| // if expr is our ScalarUDF, replace with literal filename))` Thanks @adriangb - really appreciate the feedback! I've been working on implementing your approach where `input_file_name()` stays as a 0-arg scalar UDF, then `ProjectionOpener::open(partitioned_file)` does `expr.transform(...)` to replace the `ScalarFunctionExpr` with `Literal(Utf8(partitioned_file.location))`. Does this direction look right to you? One question on plumbing: in many plans (e.g. `SELECT input_file_name() FROM t WHERE a > 1`), a top `ProjectionExec` can't push through `FilterExec` without dropping columns needed by the filter. This causes the UDF to stay above the scan and hit the runtime error guard. For this first PR, which is better? 1. Minimal opener rewrite only: works only when projection naturally pushes into the scan 2. Include analyzer/optimizer/planner glue: marks scan as needing the reserved column, injects UDF into file source projection so it works with filters/sorts/limits -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
