bharath-techie commented on issue #19386:
URL: https://github.com/apache/datafusion/issues/19386#issuecomment-3675113488

   I don't think its memory leak. The record batch size in GroupHashExecStream 
is `4196909056` and since we do slice on the record batch and send it to 
consumers, each sliced batch's size is still calculated as `4196909056` bytes. 
   
   ```
   Record batch size during insert : 4196909056
   self size : 48 , capacity : 3 , heap size : 60, batch size : 4196909056
   size during get : 4196909284
   
   Record batch size during insert size during insert : 4196909056
   self size : 48 , capacity : 3 , heap size : 60, batch size : 8393818112
   size during get : 8393818340
   
   Record batch size during insert size during insert : 4196909056
   self size : 48 , capacity : 3 , heap size : 60, batch size : 12590727168
   size during get : 12590727396
   ```
   
   https://github.com/apache/datafusion/issues/9562 - check this issue for more.
   ```
   As we see in https://github.com/apache/datafusion/issues/9417, if there are 
upstream operators like TopK that hold references to any of these sliced 
RecordBatchs, those slices are treated as though they were an additional 
allocation that needs to be tracked 
([source](https://github.com/apache/arrow-datafusion/blob/e642cc2a94f38518d765d25c8113523aedc29198/datafusion/physical-plan/src/topk/mod.rs#L576))
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to