Hi everyone,

I want to continue the discussion on which Docker images the Iceberg
project should publish. This has come up several times [1][2][3][4] and I'd
like to continue the discussion here.

So far, the main outcome has been the publication of
apache/iceberg-rest-fixture [5] (100K+ downloads), following a consensus
[2] to limit community-maintained images to the REST fixture and rely on
upstream engine projects for quickstarts. A separate thread and issue
[3][6] proposed replacing the tabulario/spark-iceberg quickstart image with
the official apache/spark image. Most recently, a proposal to add a Flink
quickstart image [4] has reopened the broader question.

One concrete case for expanding scope: both iceberg-python and iceberg-rust
currently maintain their own Spark+Iceberg Docker images for integration
testing, and we already try to keep them in sync manually [7][8]. This is
exactly the kind of duplication that centralizing under the main iceberg
repo would solve; just as we did with apache/iceberg-rest-fixture.
Publishing a shared apache/iceberg-spark image would give all subprojects a
single, well-maintained image to depend on, and reduce the maintenance
burden across the ecosystem. We can do the same for the Flink+Iceberg setup.

Given the traction the REST fixture image has seen, I think it's worth
revisiting the scope of what we publish. I'd love to hear updated views
from the community.

Thanks,
Kevin Liu

[1] https://lists.apache.org/thread/dr6nsvd8jm2gr2nn5vf7nkpr0pc5d03h
[2] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq
[3] https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4
[4] https://lists.apache.org/thread/grlgvl9fslcxrsnxyb7qqh7vjd4kkqo3
[5] https://hub.docker.com/r/apache/iceberg-rest-fixture
[6] https://github.com/apache/iceberg/issues/13519
[7] https://github.com/apache/iceberg-python/tree/main/dev/spark
[8] https://github.com/apache/iceberg-rust/tree/main/dev/spark

Reply via email to