This is an automated email from the ASF dual-hosted git repository.
zhengruifeng pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new d94fdcaa9437 [SPARK-56744][INFRA] Document test base class hierarchy
in AGENTS.md
d94fdcaa9437 is described below
commit d94fdcaa943795bd1026cacc4a47a830891132ca
Author: Ruifeng Zheng <[email protected]>
AuthorDate: Wed May 13 19:21:47 2026 +0800
[SPARK-56744][INFRA] Document test base class hierarchy in AGENTS.md
### What changes were proposed in this pull request?
Add a `Scala Test Base Classes` section to `AGENTS.md` that documents the
layered Scala test base hierarchy in this repo and how to pick a base class for
a new test suite. Spark uses the `AnyFunSuite` ScalaTest style throughout, and
the chain is:
SparkFunSuite
(core)
<- PlanTest
(sql/catalyst)
<- QueryTest
(sql/core)
`QueryTest` declares `spark: SparkSession` abstractly via
`SparkSessionProvider`, so a concrete SQL test suite mixes in one of the
session-providing traits:
QueryTest
(abstract `spark`)
+ SharedSparkSession (sql/core) -> classic in-process
`TestSparkSession`
+ TestHiveSingleton (sql/hive) -> Hive-backed `TestHive`
session
The new section also includes:
- A decision table mapping test scope (plain JVM, Catalyst plans,
SQL/DataFrame with a session) to the right base.
- A session-provider table noting that `SharedSparkSession` itself extends
`QueryTest` (so concrete suites just `extends SharedSparkSession`), while
`TestHiveSingleton` is mixed in alongside `QueryTest`.
- A linearization gotcha: the first item in an `extends` clause must
transitively extend a class. Pure helper traits (`*ErrorsBase`, `*Helper`)
cannot be put first.
`CLAUDE.md` is a symlink to `AGENTS.md`, so this change is picked up by
both AI agent toolchains.
### Why are the changes needed?
Picking the wrong test base class (e.g. extending `QueryTest` directly when
a session is needed, or `SparkFunSuite` when `PlanTest` would do) is a common
stumble when adding new Scala test suites. The information is currently spread
across the source of `SparkFunSuite`, `PlanTest`, `QueryTest`, and the
session-providing traits, with no single place that summarizes when to use
which. Documenting it in `AGENTS.md` gives both contributors and AI coding
agents a quick reference.
### Does this PR introduce _any_ user-facing change?
No. Documentation-only change to a developer/agent guide file.
### How was this patch tested?
N/A. Documentation-only change; no code or tests are affected.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude opus-4-7
Closes #55707 from zhengruifeng/add-test-base-class-guide.
Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
(cherry picked from commit 283949ccbb37bd961dbc47b38a6a0445397cabe0)
Signed-off-by: Ruifeng Zheng <[email protected]>
---
AGENTS.md | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/AGENTS.md b/AGENTS.md
index 96f5b7917cae..28944c9d7810 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -20,6 +20,33 @@ Spark Connect protocol is defined in proto files under
`sql/connect/common/src/m
Avoid introducing non-ASCII characters in code or comments. String literals
may contain non-ASCII when the content requires it (error messages, test data,
etc.). Identifiers are ASCII by convention. The common failure mode is
typographic characters (em-dash, smart quotes, ellipsis, non-breaking space)
sneaking into comments; scalastyle flags some of these. Spot-check before
committing: `grep -rn -P "[^\x00-\x7F]" <files>`.
+## Scala Test Base Classes
+
+When writing a new Scala test suite, pick the lowest base class that provides
what the test actually needs. Spark uses the `AnyFunSuite` ScalaTest style
throughout, so the bases below are the chain to choose from. Each adds
capability on top of the previous:
+
+ SparkFunSuite
(core)
+ <- PlanTest
(sql/catalyst)
+ <- QueryTest
(sql/core)
+
+| Test scope | Base | Notes |
+|------------|------|-------|
+| Plain JVM/Scala — no Spark SQL | `SparkFunSuite` | `core` utilities, RDD,
network, util classes, etc. Adds per-test timeout, `testRetry`, `gridTest`,
thread audit, fixed timezone/locale, `withTempDir`, `withLogAppender`,
`checkError`. |
+| Catalyst plan tests — no `SparkSession` | `PlanTest` | Adds `comparePlans`,
`normalizePlan`, `normalizeExprIds`. For analyzer / optimizer / planner rule
tests. |
+| SQL/DataFrame tests — needs a `SparkSession` | `QueryTest` | Adds
`checkAnswer`, codegen-on/off helpers. `spark: SparkSession` is abstract and
must be supplied by a session-providing trait (see below). |
+
+### Providing a `SparkSession` for `QueryTest`
+
+`QueryTest` declares `spark: SparkSession` abstractly via
`SparkSessionProvider`, so it cannot be instantiated on its own. A concrete
suite mixes in one of the session-providing traits below:
+
+ QueryTest
(abstract `spark`)
+ + SharedSparkSession (sql/core) -> classic in-process
`TestSparkSession`
+ + TestHiveSingleton (sql/hive) -> Hive-backed `TestHive` session
+
+| Session provider | Module / location | Typical usage |
+|---|---|---|
+| `SharedSparkSession` | `sql/core` | Already extends `QueryTest` for
historical reasons, but still mix in `QueryTest` explicitly, e.g. `class X
extends QueryTest with SharedSparkSession`. Default for tests under `sql/core`.
|
+| `TestHiveSingleton` | `sql/hive` | Mixed in alongside `QueryTest`, e.g.
`class X extends QueryTest with TestHiveSingleton`. Used by tests under
`sql/hive`. |
+
## Build and Test
Build and tests can take a long time. If the user explicitly asked to run
tests, run them. Otherwise (you are running tests on your own to verify a
change), first ask the user if they have more changes to make.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]