Samyak2 commented on issue #22526: URL: https://github.com/apache/datafusion/issues/22526#issuecomment-4668727017
> 1. the memory is reserved before the large record batch is created, so there's no guarantee that the released memory is sufficient for reserving the large record batch (well, a slice of this large record batch, but for memory accounting purposes it doesn't matter) > > > 3. since there's no way to transfer a memory reservation from one operator to another, other memory pool operations could happen in-between, so there's no guarantee that if you free N bytes from a reservation in an operator you could reserve the same N bytes in another operator Very valid points. But all of these also apply to the current memory tracking, which is `get_record_batch_memory_size` (used in HashJoin, Repartition, etc.). Currently, the downstream operator will try to reserve a lot more memory than what agg released. What I'm suggesting is strictly an improvement over the current behavior. Do you see any of these problems being made worse by a solution like https://github.com/apache/datafusion/pull/22862? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
