Ma1oneZhang opened a new issue, #17025:
URL: https://github.com/apache/datafusion/issues/17025

   ### Describe the bug
   
   ---
   ### DataFusion Performance Degradation with Small Batches
   
   We've identified a significant performance degradation in DataFusion when 
querying small data batches or data sources that return no data. Our 
investigation has pinpointed two primary causes for this issue:
   
   1.  **Excessive Metrics Overhead:** DataFusion's extensive metrics 
collection adds considerable overhead, particularly when the query processing 
time is minimal. The time spent recording and managing these metrics becomes a 
dominant factor, disproportionately impacting performance on small tasks.
   
   2.  **Fragmented Data Blocks:** Processing numerous small, fragmented data 
blocks (even empty ones) leads to inefficiencies. The overhead of managing 
these individual fragments, rather than the data itself, consumes valuable 
processing time, exacerbating the performance bottleneck.
   
   ### Proposed Solution
   
   To address this problem, i believe that add a new configuration option that 
allows users to disable metrics collection. By setting an environment variable 
or a configuration flag, users can choose to bypass the metrics system 
entirely. This change will significantly reduce the overhead associated with 
metrics, leading to improved performance for workloads involving small data 
batches or empty data sources.
   
   I believe this solution offers a practical way to balance the need for 
performance with the utility of having detailed metrics, giving users the 
flexibility to optimize DataFusion for their specific use cases.
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to