[GitHub] [arrow-datafusion] nkarpov opened a new issue, #6051: Add input_file_name built-in function

via GitHub Tue, 18 Apr 2023 12:40:59 -0700


nkarpov opened a new issue, #6051:
URL: https://github.com/apache/arrow-datafusion/issues/6051


   ### Is your feature request related to a problem or challenge?
   
   It's useful to project the source input file of a data row to support file 
aware operations, for example for storage frameworks 
(https://github.com/delta-io/delta-rs/issues/850). This is a built-in function 
in Spark, for example, 
https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.input_file_name.html
   
   There was prior work before the repository split but it looks to have lost 
momentum:
   
   https://github.com/apache/arrow/pull/9944
   https://github.com/apache/arrow/pull/9976
   https://github.com/apache/arrow/issues/18601
   
   Based on the conversations in the prior PRs and issues it looks like there 
was consensus that this feature should live in datafusion as opposed to arrow, 
so creating an issue here.
   
   ### Describe the solution you'd like
   
   A built-in function supported for both SQL and DataFrame APIs 
`input_file_name()` that returns a string of the file from which the row was 
originally scanned.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] nkarpov opened a new issue, #6051: Add input_file_name built-in function

Reply via email to