andygrove commented on PR #3536: URL: https://github.com/apache/datafusion-comet/pull/3536#issuecomment-3910404829
I ran the queries individually and compared memory usage between main and this PR. Key findings from Claude analysis of the results: 1. The memory shift is NOT consistent — it's highly query-dependent. Some queries see off-heap decrease (Q4, Q10, Q11), others see large increases (Q7, Q12, Q13). There is no single directional trend. 2. Off-heap and JVM heap sometimes move inversely. Q11 is the clearest example: off-heap dropped 56.4% while JVM heap increased 127%. Q10 shows the same pattern (off-heap -72.7%, heap +36.5%). DF52 appears to shift work between native and JVM memory for certain query shapes. 3. Join-heavy queries are most affected. The queries with the largest memory changes (Q4, Q7, Q10, Q11, Q12, Q13, Q21) all involve complex joins, correlated subqueries, or GROUP BY with HAVING. Simpler scan-and-aggregate queries (Q1, Q6) are stable. This points to changes in DataFusion 52's hash join/aggregate memory management. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
