Hi Kevin, thanks for raising this.

I agree this discussion is warranted. In the previous thread [1] we largely 
deferred making a decision on whether the Iceberg community should publish 
Docker images beyond the REST TCK image, so I think it makes sense to revisit 
it now.

While it's tempting to help out the community in every possible way, I think 
it's important to stay focused on what the project /subprojects are best 
positioned to own. In a way, I'm concerned that publishing engine specific 
Iceberg images as supported artifacts could create a long term maintenance 
burden, since we don't maintain those engines ourselves.

>From my perspective, the key question is on what criteria we should use when 
>deciding whether to publish a Docker image, and I think the clearest line is 
>whether it supports Iceberg subprojects (or other OSS projects) in testing 
>their integration with Iceberg, where we can reasonably expect it to support 
>it to a high standard.

I'm curious to hear others' thoughts on this topic.

Cheers,
Sung

[1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq

On 2026/02/19 21:06:56 Kevin Liu wrote:
> Hi everyone,
> 
> I want to continue the discussion on which Docker images the Iceberg
> project should publish. This has come up several times [1][2][3][4] and I'd
> like to continue the discussion here.
> 
> So far, the main outcome has been the publication of
> apache/iceberg-rest-fixture [5] (100K+ downloads), following a consensus
> [2] to limit community-maintained images to the REST fixture and rely on
> upstream engine projects for quickstarts. A separate thread and issue
> [3][6] proposed replacing the tabulario/spark-iceberg quickstart image with
> the official apache/spark image. Most recently, a proposal to add a Flink
> quickstart image [4] has reopened the broader question.
> 
> One concrete case for expanding scope: both iceberg-python and iceberg-rust
> currently maintain their own Spark+Iceberg Docker images for integration
> testing, and we already try to keep them in sync manually [7][8]. This is
> exactly the kind of duplication that centralizing under the main iceberg
> repo would solve; just as we did with apache/iceberg-rest-fixture.
> Publishing a shared apache/iceberg-spark image would give all subprojects a
> single, well-maintained image to depend on, and reduce the maintenance
> burden across the ecosystem. We can do the same for the Flink+Iceberg setup.
> 
> Given the traction the REST fixture image has seen, I think it's worth
> revisiting the scope of what we publish. I'd love to hear updated views
> from the community.
> 
> Thanks,
> Kevin Liu
> 
> [1] https://lists.apache.org/thread/dr6nsvd8jm2gr2nn5vf7nkpr0pc5d03h
> [2] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq
> [3] https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4
> [4] https://lists.apache.org/thread/grlgvl9fslcxrsnxyb7qqh7vjd4kkqo3
> [5] https://hub.docker.com/r/apache/iceberg-rest-fixture
> [6] https://github.com/apache/iceberg/issues/13519
> [7] https://github.com/apache/iceberg-python/tree/main/dev/spark
> [8] https://github.com/apache/iceberg-rust/tree/main/dev/spark
> 

Reply via email to