Re: [I] Add input_file_name built-in function [arrow-datafusion]

via GitHub Fri, 03 Nov 2023 08:26:25 -0700


alamb commented on issue #6051:
URL: 
https://github.com/apache/arrow-datafusion/issues/6051#issuecomment-1792647034


   > What is the correct semantics of this input_file_name() function
   
   I do not know.  However, the use case of " that returns a string of the file 
from which the row was originally scanned" suggests it would be `i` in your 
list: "Return all the files registered by a table"
   
   The [spark 
docs](https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.input_file_name.html)
 say
   
   > Creates a string column for the file name of the current Spark task.
   
   I am not sure how that maps to what file is processed (aka does the same 
task process multiple input files?)
   
   
   
   As for implementation, perhaps it would be possible to model how partition 
columns are injected, though as you will find that is non trivial to implement.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Add input_file_name built-in function [arrow-datafusion]

Reply via email to