alamb commented on code in PR #9619:
URL: https://github.com/apache/arrow-datafusion/pull/9619#discussion_r1526354142
##########
datafusion/core/tests/memory_limit.rs:
##########
@@ -301,6 +301,28 @@ async fn sort_spill_reservation() {
test.with_config(config).with_expected_success().run().await;
}
+#[tokio::test]
+async fn oom_recursive_cte() {
+ TestCase::new()
+ .with_query(
+ "WITH RECURSIVE nodes AS (
+ SELECT 1 as id
+ UNION ALL
+ SELECT UNNEST(RANGE(id+1, id+1000)) as id
+ FROM nodes
+ WHERE id < 10
+ )
+ SELECT * FROM nodes;",
+ )
+ .with_expected_errors(vec![
Review Comment:
👍
##########
datafusion/physical-plan/src/memory.rs:
##########
@@ -236,6 +237,8 @@ impl MemoryExec {
pub struct MemoryStream {
/// Vector of record batches
data: Vec<RecordBatch>,
+ /// Optional memory reservation bound to the data, freed on drop
Review Comment:
I was a little worried at first that this optional API makes it easy to
forget to provide reservation. However I see now that the reservation is used
only with the recursive CTE case
(not for this PR) In general I wondered if we should *always* have a memory
reservation to `MemoryStream` 🤔 I think that would double count batches from a
MemTable however, so it isn't an obviously good improvement
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]