Ma1oneZhang opened a new issue, #17025: URL: https://github.com/apache/datafusion/issues/17025
### Describe the bug --- ### DataFusion Performance Degradation with Small Batches We've identified a significant performance degradation in DataFusion when querying small data batches or data sources that return no data. Our investigation has pinpointed two primary causes for this issue: 1. **Excessive Metrics Overhead:** DataFusion's extensive metrics collection adds considerable overhead, particularly when the query processing time is minimal. The time spent recording and managing these metrics becomes a dominant factor, disproportionately impacting performance on small tasks. 2. **Fragmented Data Blocks:** Processing numerous small, fragmented data blocks (even empty ones) leads to inefficiencies. The overhead of managing these individual fragments, rather than the data itself, consumes valuable processing time, exacerbating the performance bottleneck. ### Proposed Solution To address this problem, i believe that add a new configuration option that allows users to disable metrics collection. By setting an environment variable or a configuration flag, users can choose to bypass the metrics system entirely. This change will significantly reduce the overhead associated with metrics, leading to improved performance for workloads involving small data batches or empty data sources. I believe this solution offers a practical way to balance the need for performance with the utility of having detailed metrics, giving users the flexibility to optimize DataFusion for their specific use cases. ### To Reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
