[PR] [VL] Fix input_file_name results empty string [incubator-gluten]

via GitHub Fri, 19 Jul 2024 01:59:06 -0700


zml1206 opened a new pull request, #6517:
URL: https://github.com/apache/incubator-gluten/pull/6517


   ## What changes were proposed in this pull request?
   The Spark implementation of input_file_name uses a thread local to stash the 
file name and retrieve it from the function. 
   If the `Project`containing input_file_name  and scan contain a transform 
node, the result of input_file_name is an empty string.
   For example, read delta lake table need union checkpoint parquet file and 
json file, then order by `input_file_name`  to get parquet data files, it will 
get wrong parquet file list.
   So we should push down input_file_name to transformer scan or add project 
before fallback scan
   
   ## How was this patch tested?
   
   UT
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [VL] Fix input_file_name results empty string [incubator-gluten]

Reply via email to