Re: [PR] feat: track memory usage for recursive CTE, enable recursive CTEs by default [arrow-datafusion]

via GitHub Fri, 15 Mar 2024 07:19:45 -0700


alamb commented on code in PR #9619:
URL: https://github.com/apache/arrow-datafusion/pull/9619#discussion_r1526354142



##########
datafusion/core/tests/memory_limit.rs:
##########
@@ -301,6 +301,28 @@ async fn sort_spill_reservation() {
     test.with_config(config).with_expected_success().run().await;
 }
 
+#[tokio::test]
+async fn oom_recursive_cte() {
+    TestCase::new()
+        .with_query(
+            "WITH RECURSIVE nodes AS (
+            SELECT 1 as id
+            UNION ALL
+            SELECT UNNEST(RANGE(id+1, id+1000)) as id
+            FROM nodes
+            WHERE id < 10
+        )
+        SELECT * FROM nodes;",
+        )
+        .with_expected_errors(vec![

Review Comment:
   👍 



##########
datafusion/physical-plan/src/memory.rs:
##########
@@ -236,6 +237,8 @@ impl MemoryExec {
 pub struct MemoryStream {
     /// Vector of record batches
     data: Vec<RecordBatch>,
+    /// Optional memory reservation bound to the data, freed on drop

Review Comment:
   I was a little worried at first that this optional API makes it easy to 
forget to provide reservation. However I see now that the reservation is used 
only with the recursive CTE case
   
   (not for this PR) In general I wondered if we should *always* have a memory 
reservation to `MemoryStream`  🤔 I think that would double count batches from a 
MemTable however, so it isn't an obviously good improvement



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] feat: track memory usage for recursive CTE, enable recursive CTEs by default [arrow-datafusion]

Reply via email to