kosiew commented on code in PR #20226:
URL: https://github.com/apache/datafusion/pull/20226#discussion_r2792490336


##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -1739,6 +1740,48 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn test_external_sorter_when_resource_exhausted() -> Result<()> {

Review Comment:
   This overlaps heavily with `test_batch_reservation_error` (same path: 
external sort memory exhaustion). 
   
   Consider extracting a small helper that builds the constrained `TaskContext` 
+ `SortExec` used by both tests to keep maintenance lower if the OOM/context 
mechanics evolve.



##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -1739,6 +1740,48 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn test_external_sorter_when_resource_exhausted() -> Result<()> {
+        let session_config = SessionConfig::new().with_batch_size(100);
+        // Memory allocation is being set lower than input data memory 
footprint
+        // to cause ResourcesExhausted error
+        let runtime = RuntimeEnvBuilder::new()
+            .with_memory_limit(1024, 1.0)
+            .build_arc()?;
+        let task_ctx = Arc::new(
+            TaskContext::default()
+                .with_session_config(session_config)
+                .with_runtime(runtime),
+        );
+
+        // The input has 200 partitions, each partition has a batch containing 
100 rows.
+        // Each row has a single Utf8 column, the Utf8 string values are 
roughly 42 bytes.
+        // The total size of the input is roughly 8.4 KB.
+        let input = test::scan_partitioned_utf8(200);
+        let schema = input.schema();
+
+        let sort_exec = Arc::new(SortExec::new(
+            [PhysicalSortExpr {
+                expr: col("i", &schema)?,
+                options: SortOptions::default(),
+            }]
+            .into(),
+            Arc::new(CoalescePartitionsExec::new(input)),
+        ));
+
+        let result = collect(Arc::clone(&sort_exec) as _, 
Arc::clone(&task_ctx)).await;
+        let err = result.unwrap_err();
+        assert!(
+            matches!(err.find_root(), DataFusionError::ResourcesExhausted(_)),
+            "Assertion failed: expected a ResourcesExhausted error, but got: 
{err:?}"

Review Comment:
   The test matches one long literal sentence including punctuation and 
wording. A small wording tweak could cause unrelated failures. 
   
   I think using `assert_contains!` with focused fragments would make this less 
brittle.



##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -1739,6 +1740,48 @@ mod tests {
         Ok(())
     }
 
+    #[tokio::test]
+    async fn test_external_sorter_when_resource_exhausted() -> Result<()> {
+        let session_config = SessionConfig::new().with_batch_size(100);
+        // Memory allocation is being set lower than input data memory 
footprint
+        // to cause ResourcesExhausted error
+        let runtime = RuntimeEnvBuilder::new()
+            .with_memory_limit(1024, 1.0)
+            .build_arc()?;
+        let task_ctx = Arc::new(
+            TaskContext::default()
+                .with_session_config(session_config)
+                .with_runtime(runtime),
+        );
+
+        // The input has 200 partitions, each partition has a batch containing 
100 rows.
+        // Each row has a single Utf8 column, the Utf8 string values are 
roughly 42 bytes.
+        // The total size of the input is roughly 8.4 KB.

Review Comment:
   200 partitions × 100 rows × ~42 bytes = 840 KB 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to