Re: [PR] Speedup sqllogictests by running long running tests first [datafusion]

via GitHub Sun, 01 Mar 2026 22:49:04 -0800


kosiew commented on code in PR #20576:
URL: https://github.com/apache/datafusion/pull/20576#discussion_r2870797573



##########
datafusion/sqllogictest/bin/sqllogictests.rs:
##########
@@ -75,6 +77,55 @@ struct FileTiming {
     elapsed: Duration,
 }
 
+/// TEST PRIORITY
+///
+/// Heuristically prioritize some test to run earlier.
+///
+/// Prioritizes test to run earlier if they are known to be long running (as
+/// each test file itself is run sequentially, but multiple test files are run
+/// in parallel.
+///
+/// Tests not listed here will run after the listed tests in an arbitrary 
order.
+///
+/// You can find the top longest running tests by running `--timing-summary` 
mode.
+/// For example
+///
+/// ```shell
+/// $  cargo test --profile=ci  --test sqllogictests -- --timing-summary top
+/// ...
+/// Per-file elapsed summary (deterministic):
+/// 1.    5.375s  push_down_filter_regression.slt
+/// 2.    3.174s  aggregate.slt
+/// 3.    3.158s  imdb.slt
+/// 4.    2.793s  joins.slt
+/// 5.    2.505s  array.slt
+/// 6.    2.265s  aggregate_skip_partial.slt
+/// 7.    2.260s  window.slt
+/// 8.    1.677s  group_by.slt
+/// 9.    0.973s  datetime/timestamps.slt
+/// 10.    0.822s  cte.slt
+/// ```
+static TEST_PRIORITY: LazyLock<HashMap<PathBuf, usize>> = LazyLock::new(|| {
+    [
+        (PathBuf::from("push_down_filter_regression.slt"), 0), // longest 
running, so run first.
+        (PathBuf::from("aggregate.slt"), 1),
+        (PathBuf::from("joins.slt"), 2),
+        (PathBuf::from("imdb.slt"), 3),
+        (PathBuf::from("array.slt"), 4),
+        (PathBuf::from("aggregate_skip_partial.slt"), 5),
+        (PathBuf::from("window.slt"), 6),
+        (PathBuf::from("group_by.slt"), 7),
+        (PathBuf::from("datetime/timestamps.slt"), 8),
+        (PathBuf::from("cte.slt"), 9),
+    ]
+    .into_iter()
+    .collect()
+});
+
+/// Default priority for tests not in the TEST_PRIORITY map. Tests with lower
+/// priority values run first.
+static DEFAULT_PRIORITY: usize = 100;

Review Comment:
   nit: can this a `const` instead of `static`?



##########
datafusion/sqllogictest/bin/sqllogictests.rs:
##########
@@ -851,7 +902,21 @@ fn read_test_files(options: &Options) -> 
Result<Vec<TestFile>> {
         paths.append(&mut sqlite_paths)
     }
 
-    Ok(paths)
+    Ok(sort_tests(paths))
+}
+
+/// Sort the tests heuristically by order of "priority"
+///
+/// Prioritizes test to run earlier if they are known to be long running (as
+/// each test file itself is run sequentially, but multiple test files are run
+/// in parallel.
+fn sort_tests(mut tests: Vec<TestFile>) -> Vec<TestFile> {

Review Comment:
   Can we add a deterministic tie-breaker in `sort_tests` (for equal priority) 
using `relative_path`, e.g. `sort_by_key(|f| (priority, 
f.relative_path.clone()))` to keep run order stable?
   
   This would also benefit from a small unit test covering:
   - prioritized files are first,
   - non-prioritized files keep deterministic ordering



##########
datafusion/sqllogictest/bin/sqllogictests.rs:
##########
@@ -75,6 +77,55 @@ struct FileTiming {
     elapsed: Duration,
 }
 
+/// TEST PRIORITY
+///
+/// Heuristically prioritize some test to run earlier.
+///
+/// Prioritizes test to run earlier if they are known to be long running (as
+/// each test file itself is run sequentially, but multiple test files are run
+/// in parallel.
+///
+/// Tests not listed here will run after the listed tests in an arbitrary 
order.
+///
+/// You can find the top longest running tests by running `--timing-summary` 
mode.
+/// For example
+///
+/// ```shell
+/// $  cargo test --profile=ci  --test sqllogictests -- --timing-summary top
+/// ...
+/// Per-file elapsed summary (deterministic):
+/// 1.    5.375s  push_down_filter_regression.slt
+/// 2.    3.174s  aggregate.slt
+/// 3.    3.158s  imdb.slt
+/// 4.    2.793s  joins.slt
+/// 5.    2.505s  array.slt
+/// 6.    2.265s  aggregate_skip_partial.slt
+/// 7.    2.260s  window.slt
+/// 8.    1.677s  group_by.slt
+/// 9.    0.973s  datetime/timestamps.slt
+/// 10.    0.822s  cte.slt
+/// ```
+static TEST_PRIORITY: LazyLock<HashMap<PathBuf, usize>> = LazyLock::new(|| {

Review Comment:
   Would a simpler static slice be sufficient here (eg &[(&str, usize)] with a 
small helper) instead of LazyLock<HashMap<PathBuf, usize>>?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Speedup sqllogictests by running long running tests first [datafusion]

Reply via email to