Hello Dmitri,

Thanks for the recap. If I understand correctly, we want to merge test coverage 
from `/regtests`  (docker based) into `/integration`? Looking at them, 
integration has most of the tests we have in regtests excepts the ones that are 
real cloud specific such as spark_sql_gcp* and spark_sql_azure*  as well as 
pyspark specific (t_pyspark). The cloud specific ones are not part of CI, in 
that case, do we still want to covert them? Also, I am not sure if we can 
convert the pyspark code into JUnit directly.

Also, regarding those duplicates classes, should we move them to common? For 
the ones that are very similar but minor diff, should we proceed with adapters 
for version specific (meaning, we won't be follow what Iceberg is doing with 
different version of spark).

Lastly, should we close current PR and handle the two above in a separate PRs 
first before revisiting Spark4 support?

Thanks,
Yong Zheng

On 2026/05/28 21:07:44 Dmitri Bourlatchkov wrote:
> Hi All,
> 
> This is to recap and follow up on today's Community Sync discussion. This
> matter came up during the review of the Spark 4 PR, but I believe it
> deserves a dedicated discussion.
> 
> Observations:
> 
> * Tests under the /regtests directory take a considerable amount of CI
> resources. This is mostly due to building custom Docker images.
> 
> * When adding support for newer Spark versions, developers naturally tend
> to copy existing test infrastructure, which results in building more Docker
> images.
> 
> * The coverage these tests provide probably does not require the heavy
> docker machinery. These tests validate that the Spark Session can interact
> with Polaris over the Iceberg REST Catalog Java client. The same coverage
> can be provided in an isolated JVM running under Gradle more efficiently
> (without a docker env.)
> 
> Proposal:
> 
> * Gradually convert Spark tests under /regtest to Gradle tasks and JUnit
> tests without Docker.
> 
> The exact impl. is to be figured out.
> 
> Most of the tests can run under the Junit framework using a local Polaris
> server (same JVM or different JVM depending on use case).
> 
> True "integration" tests can still run under Gradle as a task that starts a
> Spark shell in a fresh JVM and executes a small set of SQL commands in it.
> However, these "big" tests can probably be limited in number and complexity
> and as such should not generate excessive CI load. More specifically, these
> tests probably do not need to assert the verbatim output of the SQL
> commands. A basic sanity check should be sufficient. Functional tests can
> be performed under JUnit, I think.
> 
> Thoughts?
> 
> Thanks,
> Dmitri.
> 

Reply via email to