alamb commented on issue #6051: URL: https://github.com/apache/arrow-datafusion/issues/6051#issuecomment-1792647034
> What is the correct semantics of this input_file_name() function I do not know. However, the use case of " that returns a string of the file from which the row was originally scanned" suggests it would be `i` in your list: "Return all the files registered by a table" The [spark docs](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.input_file_name.html) say > Creates a string column for the file name of the current Spark task. I am not sure how that maps to what file is processed (aka does the same task process multiple input files?) As for implementation, perhaps it would be possible to model how partition columns are injected, though as you will find that is non trivial to implement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
