One clarification, the regtests we talk about here is only 
https://github.com/apache/polaris/tree/main/plugins/spark/v3.5/regtests or 
https://github.com/apache/polaris/tree/main/regtests as well? as the 
v3.5/regtests doesn't have cloud dependencies but the later one has.

Thanks,
Yong Zheng

On 2026/05/29 01:29:24 Yong Zheng wrote:
> Hello Dmitri,
> 
> Thanks for the recap. If I understand correctly, we want to merge test 
> coverage from `/regtests`  (docker based) into `/integration`? Looking at 
> them, integration has most of the tests we have in regtests excepts the ones 
> that are real cloud specific such as spark_sql_gcp* and spark_sql_azure*  as 
> well as pyspark specific (t_pyspark). The cloud specific ones are not part of 
> CI, in that case, do we still want to covert them? Also, I am not sure if we 
> can convert the pyspark code into JUnit directly.
> 
> Also, regarding those duplicates classes, should we move them to common? For 
> the ones that are very similar but minor diff, should we proceed with 
> adapters for version specific (meaning, we won't be follow what Iceberg is 
> doing with different version of spark).
> 
> Lastly, should we close current PR and handle the two above in a separate PRs 
> first before revisiting Spark4 support?
> 
> Thanks,
> Yong Zheng
> 
> On 2026/05/28 21:07:44 Dmitri Bourlatchkov wrote:
> > Hi All,
> > 
> > This is to recap and follow up on today's Community Sync discussion. This
> > matter came up during the review of the Spark 4 PR, but I believe it
> > deserves a dedicated discussion.
> > 
> > Observations:
> > 
> > * Tests under the /regtests directory take a considerable amount of CI
> > resources. This is mostly due to building custom Docker images.
> > 
> > * When adding support for newer Spark versions, developers naturally tend
> > to copy existing test infrastructure, which results in building more Docker
> > images.
> > 
> > * The coverage these tests provide probably does not require the heavy
> > docker machinery. These tests validate that the Spark Session can interact
> > with Polaris over the Iceberg REST Catalog Java client. The same coverage
> > can be provided in an isolated JVM running under Gradle more efficiently
> > (without a docker env.)
> > 
> > Proposal:
> > 
> > * Gradually convert Spark tests under /regtest to Gradle tasks and JUnit
> > tests without Docker.
> > 
> > The exact impl. is to be figured out.
> > 
> > Most of the tests can run under the Junit framework using a local Polaris
> > server (same JVM or different JVM depending on use case).
> > 
> > True "integration" tests can still run under Gradle as a task that starts a
> > Spark shell in a fresh JVM and executes a small set of SQL commands in it.
> > However, these "big" tests can probably be limited in number and complexity
> > and as such should not generate excessive CI load. More specifically, these
> > tests probably do not need to assert the verbatim output of the SQL
> > commands. A basic sanity check should be sufficient. Functional tests can
> > be performed under JUnit, I think.
> > 
> > Thoughts?
> > 
> > Thanks,
> > Dmitri.
> > 
> 

Reply via email to