One clarification, the regtests we talk about here is only https://github.com/apache/polaris/tree/main/plugins/spark/v3.5/regtests or https://github.com/apache/polaris/tree/main/regtests as well? as the v3.5/regtests doesn't have cloud dependencies but the later one has.
Thanks, Yong Zheng On 2026/05/29 01:29:24 Yong Zheng wrote: > Hello Dmitri, > > Thanks for the recap. If I understand correctly, we want to merge test > coverage from `/regtests` (docker based) into `/integration`? Looking at > them, integration has most of the tests we have in regtests excepts the ones > that are real cloud specific such as spark_sql_gcp* and spark_sql_azure* as > well as pyspark specific (t_pyspark). The cloud specific ones are not part of > CI, in that case, do we still want to covert them? Also, I am not sure if we > can convert the pyspark code into JUnit directly. > > Also, regarding those duplicates classes, should we move them to common? For > the ones that are very similar but minor diff, should we proceed with > adapters for version specific (meaning, we won't be follow what Iceberg is > doing with different version of spark). > > Lastly, should we close current PR and handle the two above in a separate PRs > first before revisiting Spark4 support? > > Thanks, > Yong Zheng > > On 2026/05/28 21:07:44 Dmitri Bourlatchkov wrote: > > Hi All, > > > > This is to recap and follow up on today's Community Sync discussion. This > > matter came up during the review of the Spark 4 PR, but I believe it > > deserves a dedicated discussion. > > > > Observations: > > > > * Tests under the /regtests directory take a considerable amount of CI > > resources. This is mostly due to building custom Docker images. > > > > * When adding support for newer Spark versions, developers naturally tend > > to copy existing test infrastructure, which results in building more Docker > > images. > > > > * The coverage these tests provide probably does not require the heavy > > docker machinery. These tests validate that the Spark Session can interact > > with Polaris over the Iceberg REST Catalog Java client. The same coverage > > can be provided in an isolated JVM running under Gradle more efficiently > > (without a docker env.) > > > > Proposal: > > > > * Gradually convert Spark tests under /regtest to Gradle tasks and JUnit > > tests without Docker. > > > > The exact impl. is to be figured out. > > > > Most of the tests can run under the Junit framework using a local Polaris > > server (same JVM or different JVM depending on use case). > > > > True "integration" tests can still run under Gradle as a task that starts a > > Spark shell in a fresh JVM and executes a small set of SQL commands in it. > > However, these "big" tests can probably be limited in number and complexity > > and as such should not generate excessive CI load. More specifically, these > > tests probably do not need to assert the verbatim output of the SQL > > commands. A basic sanity check should be sufficient. Functional tests can > > be performed under JUnit, I think. > > > > Thoughts? > > > > Thanks, > > Dmitri. > > >
