acvictor opened a new pull request, #11459:
URL: https://github.com/apache/incubator-gluten/pull/11459

   
   ## What changes are proposed in this pull request?
   This PR fixes an issue where the numFiles driver-side metric was not being 
populated when using Gluten/Velox for file scans in Spark 4.0.
   - Added sendDriverMetrics() call after the if/else block in 
dynamicallySelectedPartitions to match spark35 shim behavior
   - Changed getPartitionArray() to use 
dynamicallySelectedPartitions.filePartitionIterator instead of directly listing 
files, ensuring the metrics initialization chain is properly triggered
   
   The numFiles metric (and other driver-side metrics like filesSize, 
numPartitions) were always returning 0 in Gluten's Spark 4.0 shim because:
   
   - sendDriverMetrics() was never called - When there were no dynamic 
partition filters, the dynamicallySelectedPartitions method returned 
selectedPartitions directly without calling sendDriverMetrics() to post the 
metrics to Spark's metrics system.
   
   - getPartitionArray() bypassed the metrics initialization chain - It 
directly called relation.location.listFiles() instead of using 
dynamicallySelectedPartitions, which meant the selectedPartitions lazy val 
(where setFilesNumAndSizeMetric is called) was never evaluated.
   
   ## How was this patch tested?
   UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to