cetra3 opened a new pull request, #20734: URL: https://github.com/apache/datafusion/pull/20734
## Which issue does this PR close? Related to https://github.com/apache/datafusion/issues/20714 but doesn't actually fix any issues at the moment, merely highlights that memory accounting is currently inaccurate Here's a table of measured heap profiles based upon these tests: | Test | Operator | Pool | Peak | Ratio | Assert | |------|----------|------|------|-------|--------| | heap_profile_repartition | RepartitionExec | 10MB | 10.7MB | 1.02x | active | | heap_profile_hash_join | HashJoinExec | 40MB | 44.4MB | 1.06x | active | | heap_profile_sort_merge_join | SortMergeJoinExec | 40MB | 33.3MB | 0.79x | active | | heap_profile_sort | SortExec | 10MB | **65.9MB** | **6.28x** | TODO | | heap_profile_hash_aggregate | GroupedHashAggregate | 10MB | **121.7MB** | **11.60x** | TODO | | heap_profile_window | WindowAggExec | 10MB | **33.4MB** | **3.18x** | TODO | | heap_profile_parquet_sort | Parquet + SortExec | 20MB | **66.7MB** | **3.18x** | TODO | ## Rationale for this change This adds some preliminary heap profile testing using `dhat-rs` to record heap allocations and report on memory usage for a handful of canned queries. Each of them allow *some* head room as there can be some overhead (1.1x) in other parts of the process. While more test types can be added in the future, already *half* of these heap profile tests blow out memory usage. For the failing tests, these are commented out with some information about why they might fail. Essentially we should raise PRs to fix memory accounting and then add the appropriate assertions. ## What changes are included in this PR? * Adds `dhat` as a dev dependency * Creates a handful of tests for the heap profile. ## Are these changes tested? Yes, as they are only tests ## Are there any user-facing changes? None at this stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
