avantgardnerio opened a new issue, #13831:
URL: https://github.com/apache/datafusion/issues/13831

   ### Describe the bug
   
   When attempting to accumulate large text fields with a `group by`, it was 
observed that `group_aggregate_batch()` can OOM despite ostensibly using the 
`MemoryPool`. 
   
   Query:
   
   ```
   select truncated_time, count(*) AS cnt
   from (
       select
           truncated_time, k8s_deployment_name, message
       from (
           SELECT
               priorityclass,
               timestamp,
               date_trunc('day', timestamp) AS truncated_time,
               k8s_deployment_name,
               message
           FROM agg_oom
           where priorityclass != 'low'
       )
       group by truncated_time, k8s_deployment_name, message
   ) group by truncated_time
   ```
   
   On 8x ~50MB parquet files where the `message` column can be up to 8192 byte 
strings. When profiled, by far it was the largest use of memory: 
   
   
![image](https://github.com/user-attachments/assets/8e2478f8-78ff-4850-8d37-5133f6f4579d)
   
   When logging, we can see it fails while interning
   
   ```
   converting 3 rows
   interning 8192 rows with 1486954 bytes
   interned 8192 rows, now I'm 13054176 bytes
   resizing to 14103171
   resizing to 14103171
   reserving 28206342 extra bytes
   converting 3 rows
   interning 8192 rows with 1350859 bytes
   memory allocation of 25690112 bytes failed
   Aborted (core dumped)
   ```
   
   ### To Reproduce
   
   1. set up a test with
   
   ```
       let memory_limit = 125_000_000;
       let MEMORY_FRACTION = 1.0;
       let rt_config = RuntimeConfig::new()
           .with_memory_limit(memory_limit, MEMORY_FRACTION);
   ```
   
   2.set `ulimit -v 1152000`
   
   3. query some parquet files with long strings
   
   ### Expected behavior
   
   `group_aggregate_batch()` doesn't make the assumption:
   
   ```
               // Here we can ignore `insufficient_capacity_err` because we 
will spill later,
               // but at least one batch should fit in the memory
   ```
   
   But instead realizes that adding 1 row to a million doesn't allocate 
1,000,001, but rather 2,000,000 when the `Vec` exponentially resizes.
   
   ### Additional context
   
   Proposed solution: 
   
   Add 
   
   ```
               self.reservation.try_resize(self.reservation.size() * 2)?;
   ```
   
   Above 
   
   ```
               self.group_values
                   .intern(group_values, &mut self.current_group_indices)?;
   ```                


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to