nkarpov opened a new issue, #6051: URL: https://github.com/apache/arrow-datafusion/issues/6051
### Is your feature request related to a problem or challenge? It's useful to project the source input file of a data row to support file aware operations, for example for storage frameworks (https://github.com/delta-io/delta-rs/issues/850). This is a built-in function in Spark, for example, https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.input_file_name.html There was prior work before the repository split but it looks to have lost momentum: https://github.com/apache/arrow/pull/9944 https://github.com/apache/arrow/pull/9976 https://github.com/apache/arrow/issues/18601 Based on the conversations in the prior PRs and issues it looks like there was consensus that this feature should live in datafusion as opposed to arrow, so creating an issue here. ### Describe the solution you'd like A built-in function supported for both SQL and DataFrame APIs `input_file_name()` that returns a string of the file from which the row was originally scanned. ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
