I had the same thought as Peter. For the connectors living in Iceberg repo (Flink, Kafka Connect, Spark), Iceberg should publish the Docker images if we agree that the benefit outweighs the overhead.
Since we already publish a Spark docker image for PyIceberg project, it makes more sense to publish it from the Iceberg repository instead of the PyIceberg repository. On Fri, Feb 20, 2026 at 7:19 AM Kevin Liu <[email protected]> wrote: > > Given this, my suggestion is that Iceberg should publish the quickstart > Docker images for integrations we own, like Spark and Flink. For > integrations where we don’t own the code, such as Trino and Hive, the > respective projects should continue to publish their own images. > > +1, this pretty much summarizes my thoughts. And I think it also aligns > with what Sung mentioned above. > Publishing Iceberg Spark and Flink docker images is a great outcome IMO. > And of course, we have to ensure compliance with ASF policies. :) > > The two main use cases I see for expanding our published images are: > (1) giving users a quick, out-of-the-box way to get started with Iceberg > on a given engine, and > (2) providing subprojects (like iceberg-python and iceberg-rust) a shared, > canonical image to depend on for integration testing; eliminating the > duplicated maintenance we have today. > > Would love to hear about what others think. > > Best, > Kevin Liu > > > On Fri, Feb 20, 2026 at 1:04 AM Péter Váry <[email protected]> > wrote: > >> One important aspect to consider is where the integration code actually >> lives. Both the Spark and Flink integrations are maintained directly in the >> Iceberg repository, which means the Iceberg community is responsible for >> keeping these connectors working. If we moved the Docker image creation >> into the Spark or Flink projects, we would introduce a circular dependency >> that would make release coordination much more complicated. >> >> For example, imagine Spark releases version 4.2. At that point, no >> Iceberg integration exists yet. Once we update Iceberg, the support for >> Spark 4.2 would land in an Iceberg release. Let's say, Iceberg 1.12.0. At >> that point, we can publish the iceberg-1.12.0-spark-4.2-quickstart image, >> aligned with our release cycle. But if the Spark project were responsible >> for publishing the image, they would need a separate, additional release >> cycle just for the Docker image, which doesn't fit naturally into their >> workflow. >> >> Given this, my suggestion is that Iceberg should publish the quickstart >> Docker images for integrations we own, like Spark and Flink. For >> integrations where we don’t own the code, such as Trino and Hive, the >> respective projects should continue to publish their own images. >> >> Sung Yun <[email protected]> ezt írta (időpont: 2026. febr. 20., P, >> 3:29): >> >>> Hi Kevin, thanks for raising this. >>> >>> I agree this discussion is warranted. In the previous thread [1] we >>> largely deferred making a decision on whether the Iceberg community should >>> publish Docker images beyond the REST TCK image, so I think it makes sense >>> to revisit it now. >>> >>> While it's tempting to help out the community in every possible way, I >>> think it's important to stay focused on what the project /subprojects are >>> best positioned to own. In a way, I'm concerned that publishing engine >>> specific Iceberg images as supported artifacts could create a long term >>> maintenance burden, since we don't maintain those engines ourselves. >>> >>> From my perspective, the key question is on what criteria we should use >>> when deciding whether to publish a Docker image, and I think the clearest >>> line is whether it supports Iceberg subprojects (or other OSS projects) in >>> testing their integration with Iceberg, where we can reasonably expect it >>> to support it to a high standard. >>> >>> I'm curious to hear others' thoughts on this topic. >>> >>> Cheers, >>> Sung >>> >>> [1] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq >>> >>> On 2026/02/19 21:06:56 Kevin Liu wrote: >>> > Hi everyone, >>> > >>> > I want to continue the discussion on which Docker images the Iceberg >>> > project should publish. This has come up several times [1][2][3][4] >>> and I'd >>> > like to continue the discussion here. >>> > >>> > So far, the main outcome has been the publication of >>> > apache/iceberg-rest-fixture [5] (100K+ downloads), following a >>> consensus >>> > [2] to limit community-maintained images to the REST fixture and rely >>> on >>> > upstream engine projects for quickstarts. A separate thread and issue >>> > [3][6] proposed replacing the tabulario/spark-iceberg quickstart image >>> with >>> > the official apache/spark image. Most recently, a proposal to add a >>> Flink >>> > quickstart image [4] has reopened the broader question. >>> > >>> > One concrete case for expanding scope: both iceberg-python and >>> iceberg-rust >>> > currently maintain their own Spark+Iceberg Docker images for >>> integration >>> > testing, and we already try to keep them in sync manually [7][8]. This >>> is >>> > exactly the kind of duplication that centralizing under the main >>> iceberg >>> > repo would solve; just as we did with apache/iceberg-rest-fixture. >>> > Publishing a shared apache/iceberg-spark image would give all >>> subprojects a >>> > single, well-maintained image to depend on, and reduce the maintenance >>> > burden across the ecosystem. We can do the same for the Flink+Iceberg >>> setup. >>> > >>> > Given the traction the REST fixture image has seen, I think it's worth >>> > revisiting the scope of what we publish. I'd love to hear updated views >>> > from the community. >>> > >>> > Thanks, >>> > Kevin Liu >>> > >>> > [1] https://lists.apache.org/thread/dr6nsvd8jm2gr2nn5vf7nkpr0pc5d03h >>> > [2] https://lists.apache.org/thread/xl1cwq7vmnh6zgfd2vck2nq7dfd33ncq >>> > [3] https://lists.apache.org/thread/4kknk8mvnffbmhdt63z8t4ps0mt1jbf4 >>> > [4] https://lists.apache.org/thread/grlgvl9fslcxrsnxyb7qqh7vjd4kkqo3 >>> > [5] https://hub.docker.com/r/apache/iceberg-rest-fixture >>> > [6] https://github.com/apache/iceberg/issues/13519 >>> > [7] https://github.com/apache/iceberg-python/tree/main/dev/spark >>> > [8] https://github.com/apache/iceberg-rust/tree/main/dev/spark >>> > >>> >>
