kosiew commented on code in PR #20576:
URL: https://github.com/apache/datafusion/pull/20576#discussion_r2870797573
##########
datafusion/sqllogictest/bin/sqllogictests.rs:
##########
@@ -75,6 +77,55 @@ struct FileTiming {
elapsed: Duration,
}
+/// TEST PRIORITY
+///
+/// Heuristically prioritize some test to run earlier.
+///
+/// Prioritizes test to run earlier if they are known to be long running (as
+/// each test file itself is run sequentially, but multiple test files are run
+/// in parallel.
+///
+/// Tests not listed here will run after the listed tests in an arbitrary
order.
+///
+/// You can find the top longest running tests by running `--timing-summary`
mode.
+/// For example
+///
+/// ```shell
+/// $ cargo test --profile=ci --test sqllogictests -- --timing-summary top
+/// ...
+/// Per-file elapsed summary (deterministic):
+/// 1. 5.375s push_down_filter_regression.slt
+/// 2. 3.174s aggregate.slt
+/// 3. 3.158s imdb.slt
+/// 4. 2.793s joins.slt
+/// 5. 2.505s array.slt
+/// 6. 2.265s aggregate_skip_partial.slt
+/// 7. 2.260s window.slt
+/// 8. 1.677s group_by.slt
+/// 9. 0.973s datetime/timestamps.slt
+/// 10. 0.822s cte.slt
+/// ```
+static TEST_PRIORITY: LazyLock<HashMap<PathBuf, usize>> = LazyLock::new(|| {
+ [
+ (PathBuf::from("push_down_filter_regression.slt"), 0), // longest
running, so run first.
+ (PathBuf::from("aggregate.slt"), 1),
+ (PathBuf::from("joins.slt"), 2),
+ (PathBuf::from("imdb.slt"), 3),
+ (PathBuf::from("array.slt"), 4),
+ (PathBuf::from("aggregate_skip_partial.slt"), 5),
+ (PathBuf::from("window.slt"), 6),
+ (PathBuf::from("group_by.slt"), 7),
+ (PathBuf::from("datetime/timestamps.slt"), 8),
+ (PathBuf::from("cte.slt"), 9),
+ ]
+ .into_iter()
+ .collect()
+});
+
+/// Default priority for tests not in the TEST_PRIORITY map. Tests with lower
+/// priority values run first.
+static DEFAULT_PRIORITY: usize = 100;
Review Comment:
nit: can this a `const` instead of `static`?
##########
datafusion/sqllogictest/bin/sqllogictests.rs:
##########
@@ -851,7 +902,21 @@ fn read_test_files(options: &Options) ->
Result<Vec<TestFile>> {
paths.append(&mut sqlite_paths)
}
- Ok(paths)
+ Ok(sort_tests(paths))
+}
+
+/// Sort the tests heuristically by order of "priority"
+///
+/// Prioritizes test to run earlier if they are known to be long running (as
+/// each test file itself is run sequentially, but multiple test files are run
+/// in parallel.
+fn sort_tests(mut tests: Vec<TestFile>) -> Vec<TestFile> {
Review Comment:
Can we add a deterministic tie-breaker in `sort_tests` (for equal priority)
using `relative_path`, e.g. `sort_by_key(|f| (priority,
f.relative_path.clone()))` to keep run order stable?
This would also benefit from a small unit test covering:
- prioritized files are first,
- non-prioritized files keep deterministic ordering
##########
datafusion/sqllogictest/bin/sqllogictests.rs:
##########
@@ -75,6 +77,55 @@ struct FileTiming {
elapsed: Duration,
}
+/// TEST PRIORITY
+///
+/// Heuristically prioritize some test to run earlier.
+///
+/// Prioritizes test to run earlier if they are known to be long running (as
+/// each test file itself is run sequentially, but multiple test files are run
+/// in parallel.
+///
+/// Tests not listed here will run after the listed tests in an arbitrary
order.
+///
+/// You can find the top longest running tests by running `--timing-summary`
mode.
+/// For example
+///
+/// ```shell
+/// $ cargo test --profile=ci --test sqllogictests -- --timing-summary top
+/// ...
+/// Per-file elapsed summary (deterministic):
+/// 1. 5.375s push_down_filter_regression.slt
+/// 2. 3.174s aggregate.slt
+/// 3. 3.158s imdb.slt
+/// 4. 2.793s joins.slt
+/// 5. 2.505s array.slt
+/// 6. 2.265s aggregate_skip_partial.slt
+/// 7. 2.260s window.slt
+/// 8. 1.677s group_by.slt
+/// 9. 0.973s datetime/timestamps.slt
+/// 10. 0.822s cte.slt
+/// ```
+static TEST_PRIORITY: LazyLock<HashMap<PathBuf, usize>> = LazyLock::new(|| {
Review Comment:
Would a simpler static slice be sufficient here (eg &[(&str, usize)] with a
small helper) instead of LazyLock<HashMap<PathBuf, usize>>?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]