Re: [PR] test(table): validate variant and unknown types against Spark 4 [iceberg-go]

via GitHub Thu, 11 Jun 2026 11:37:50 -0700


laskoviymishka commented on code in PR #1182:
URL: https://github.com/apache/iceberg-go/pull/1182#discussion_r3398253977



##########
table/transaction_test.go:
##########
@@ -73,6 +73,15 @@ func (s *SparkIntegrationTestSuite) SetupTest() {
        s.cat = cat
 }
 
+func (s *SparkIntegrationTestSuite) requireSpark4() {
+       s.T().Helper()
+       major, err := recipe.SparkMajorVersion(s.T())
+       s.Require().NoError(err, "spark version extraction failed")
+       if major < 4 {
+               s.T().Skipf("requires Spark 4+ (running Spark %d)", major)
+       }
+}
+
 func (s *SparkIntegrationTestSuite) TestSetProperties() {
        icebergSchema := iceberg.NewSchema(0,

Review Comment:
   The skip lands after we've created the table, written, committed, and 
scanned — so on Spark 3.5 this isn't really skipped, it does all the work and 
then bails.
   
   Two problems: the Go-side assertions above run under a test whose skip 
message says "requires Spark 4+", which is misleading; and the write side 
effects are already committed, so a rerun on 3.5 can hit `CreateTable` on an 
existing `go_variant_events`/`go_unknown_table` and fail with a conflict. I'd 
move `s.requireSpark4()` to the first line of both this test and 
`TestUnknownTypeWriteAndScan` (line 144), so nothing executes before the skip 
decision.



##########
internal/recipe/local_spark.go:
##########
@@ -140,3 +142,33 @@ func ExecuteSpark(t *testing.T, scriptPath string, args 
...string) (string, erro
 
        return string(output), nil
 }
+
+var (
+       sparkMajorOnce sync.Once
+       sparkMajor     int
+       sparkMajorErr  error
+)
+
+// SparkMajorVersion returns the major pyspark version (3 or 4) of the

Review Comment:
   Two things tangled together here, both rooted in caching the result keyed 
off `t`.
   
   First, `sync.Once` caches the error permanently. If the container is 
momentarily unavailable on the first call — docker-exec startup timing is flaky 
— `sparkMajorErr` is set for the whole process and every Spark-4-gated test 
afterward fails with "extract spark version", masking the real cause. I'd only 
cache on success: either drop the `Once` and accept one exec per test, or guard 
with a mutex that leaves the result unset on error so the next caller retries.
   
   Second, the closure captures the *first* caller's `*testing.T` and uses its 
context/logging for an exec that later tests depend on. If that first test has 
returned, calling its `t` methods is a data race / "log after test finished" 
panic. I'd detach detection from `t` entirely — drive the one-time check off 
`context.Background()` with a raw exec helper, make `SparkMajorVersion()` take 
no `t`, and have call sites become `recipe.SparkMajorVersion()`. wdyt?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] test(table): validate variant and unknown types against Spark 4 [iceberg-go]

Reply via email to