laskoviymishka commented on code in PR #1182:
URL: https://github.com/apache/iceberg-go/pull/1182#discussion_r3398253977
##########
table/transaction_test.go:
##########
@@ -73,6 +73,15 @@ func (s *SparkIntegrationTestSuite) SetupTest() {
s.cat = cat
}
+func (s *SparkIntegrationTestSuite) requireSpark4() {
+ s.T().Helper()
+ major, err := recipe.SparkMajorVersion(s.T())
+ s.Require().NoError(err, "spark version extraction failed")
+ if major < 4 {
+ s.T().Skipf("requires Spark 4+ (running Spark %d)", major)
+ }
+}
+
func (s *SparkIntegrationTestSuite) TestSetProperties() {
icebergSchema := iceberg.NewSchema(0,
Review Comment:
The skip lands after we've created the table, written, committed, and
scanned — so on Spark 3.5 this isn't really skipped, it does all the work and
then bails.
Two problems: the Go-side assertions above run under a test whose skip
message says "requires Spark 4+", which is misleading; and the write side
effects are already committed, so a rerun on 3.5 can hit `CreateTable` on an
existing `go_variant_events`/`go_unknown_table` and fail with a conflict. I'd
move `s.requireSpark4()` to the first line of both this test and
`TestUnknownTypeWriteAndScan` (line 144), so nothing executes before the skip
decision.
##########
internal/recipe/local_spark.go:
##########
@@ -140,3 +142,33 @@ func ExecuteSpark(t *testing.T, scriptPath string, args
...string) (string, erro
return string(output), nil
}
+
+var (
+ sparkMajorOnce sync.Once
+ sparkMajor int
+ sparkMajorErr error
+)
+
+// SparkMajorVersion returns the major pyspark version (3 or 4) of the
Review Comment:
Two things tangled together here, both rooted in caching the result keyed
off `t`.
First, `sync.Once` caches the error permanently. If the container is
momentarily unavailable on the first call — docker-exec startup timing is flaky
— `sparkMajorErr` is set for the whole process and every Spark-4-gated test
afterward fails with "extract spark version", masking the real cause. I'd only
cache on success: either drop the `Once` and accept one exec per test, or guard
with a mutex that leaves the result unset on error so the next caller retries.
Second, the closure captures the *first* caller's `*testing.T` and uses its
context/logging for an exec that later tests depend on. If that first test has
returned, calling its `t` methods is a data race / "log after test finished"
panic. I'd detach detection from `t` entirely — drive the one-time check off
`context.Background()` with a raw exec helper, make `SparkMajorVersion()` take
no `t`, and have call sites become `recipe.SparkMajorVersion()`. wdyt?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]