kosiew opened a new pull request, #16926:
URL: https://github.com/apache/datafusion/pull/16926

   ## Which issue does this PR close?
   
   - Closes #16984
   
   ## Rationale for this change
   
   This PR introduces a standardized way to inspect and debug memory usage 
across various components of the execution engine. The `ExplainMemory` trait 
provides a human-readable summary of memory usage, which is particularly 
helpful for debugging and monitoring purposes. This also lays the groundwork 
for improving visibility into memory consumption patterns in complex queries.
   
   ## What changes are included in this PR?
   
   - Introduced a new module `memory_report` implementing the `ExplainMemory` 
trait.
   - Blanket implementation of `ExplainMemory` for any `Accumulator` using its 
`size()` method.
   - Specific implementation of `ExplainMemory` for `MemoryReservation`.
   - A helper function `report_top_consumers` that attempts to downcast a 
memory pool and extract consumer memory usage stats.
   - Unit tests verifying the behavior of the `ExplainMemory` trait.
   - Implemented `ExplainMemory` for `GroupedHashAggregateStream` to provide 
detailed insight into memory usage of group aggregates.
   - Enabled `lz4` and `zstd` features in `datafusion/physical-plan` to allow 
test coverage of compression-related functionality.
   
   ## Are these changes tested?
   
   Yes:
   - Added unit tests for both `MemoryReservation` and `Accumulator` 
implementations of `ExplainMemory`.
   - Ensured the `GroupedHashAggregateStream`'s memory explanation logic 
compiles and integrates cleanly.
   - Existing and new tests validate the memory reporting logic.
   
   ## Are there any user-facing changes?
   
   Yes:
   - Users can now call `.explain_memory()` on memory reservations and 
accumulators to obtain human-readable memory usage summaries.
   - Enhanced memory reporting visibility for `GroupedHashAggregateStream`, 
aiding debugging and optimization efforts.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to