samuelcolvin opened a new issue, #10511:
URL: https://github.com/apache/datafusion/issues/10511

   ### Describe the bug
   
   I'm running the following query:
   
   ```sql
   select span_name from records order by bit_length(attributes) desc limit 20
   ```
   
   And it's running out of memory with 20GB memory limit 
(`RuntimeConfig::new().with_memory_limit(20 * 1024 * 1024 * 1024, 0.8)`), and 
passing with 30GB allowed.
   
   Error message is:
   ```
   Failed to allocate additional 25887088 bytes for ExternalSorterMerge[1] with 
585120448 bytes already allocated - maximum available is 23605759
   ```
   
   The point is that in theory this query only needs to hold the `span_name`s 
of the 20 records with the longest `attributes` in memory.
   
   But even if it chose to hold all `span_name` in memory, it shouldn't need 
this much memory:
   * there's "only" 12_980_628 rows
   * with `sum(bit_length(span_name)) = 1_038_805_400` aka ~1GB, for all rows
   
   ### To Reproduce
   
   The dataset and code aren't public, but It shouldn't be too hard to 
reproduce with a table containing 2 text columns
   
   ### Expected behavior
   
   Ideally a query like this would have a far more modest memory foot print.
   
   ### Additional context
   
   Using datafusion v38.0.0, same error with mimalloc and without.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to