kevinjqliu opened a new pull request, #15340:
URL: https://github.com/apache/iceberg/pull/15340

   Follow up to #15124, I noticed an issue when rerunning the quickstart docker 
container again (`docker compose -f 
docker/iceberg-flink-quickstart/docker-compose.yml up -d --build`)
   
   ##  Repro
   To reproduce, run the quickstart docker container with the command above, 
then run the flink sql commands using `docker exec -it jobmanager 
./bin/sql-client.sh`.
   Rerun the container and these flink sql commands again; `CREATE TABLE` fails 
without this PR. 
   Flink SQL:
   ```
   CREATE CATALOG iceberg WITH (
       'type' = 'iceberg',
       'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog',
       'uri' = 'http://iceberg-rest:8181',
       'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO',
       's3.endpoint' = 'http://minio:9000'
   );
   
   CREATE DATABASE IF NOT EXISTS `iceberg`.demo;
   
   CREATE TABLE IF NOT EXISTS `iceberg`.`demo`.sample (
       id   BIGINT   COMMENT 'unique id',
       data STRING   COMMENT 'payload',
       ts   TIMESTAMP(3) COMMENT 'event time'
   );
   
   INSERT INTO `iceberg`.`demo`.sample VALUES
       (1, 'alpha',   TIMESTAMP '2026-02-16 10:00:00'),
       (2, 'bravo',   TIMESTAMP '2026-02-16 10:01:00'),
       (3, 'charlie', TIMESTAMP '2026-02-16 10:02:00');
   
   SELECT * FROM `iceberg`.`demo`.sample;
   ```
   
   ## Summary
   
   Fix the Flink quickstart `docker-compose.yml` so that `docker compose up -d 
--build` is safe to rerun without breaking the Iceberg REST catalog.
   
   ## Problem
   
   The `create-bucket` init container ran `mc rm -r --force minio/warehouse` on 
every execution, wiping all S3 data (metadata JSON, Parquet files, Avro 
manifests). However, the Iceberg REST catalog's SQLite database persisted 
inside its running container, leaving it with stale references to deleted 
metadata files. Any subsequent table operation would fail with:
   
   ```
   NotFoundException: Location does not exist: 
s3://warehouse/demo/sample/metadata/00001-....metadata.json
   ```
   
   ## Changes
   
   - **Idempotent bucket creation**: Replace destructive `mc rm -r --force` + 
`mc mb` with `mc mb --ignore-existing` to create the bucket only if it doesn't 
exist
   - **Prevent re-execution on rerun**: Add `tail -f /dev/null` to keep the 
`create-bucket` container alive, so `docker compose up` treats it as already 
running
   - **Healthcheck-gated startup**: Add a healthcheck (`mc ls minio/warehouse`) 
to `create-bucket` and update `iceberg-rest` to depend on `service_healthy`, 
ensuring the bucket is verified to exist before the catalog starts
   - **Fix deprecated CLI**: Replace `mc policy set` with `mc anonymous set` to 
avoid deprecation warnings
   - **Remove redundant retry loop**: The `until` loop in `create-bucket` is no 
longer needed since it now depends on `minio: service_healthy`
   
   ## Behavior
   
   | Command | Before | After |
   |---|---|---|
   | `docker compose up -d --build` (first) | Works | Works |
   | `docker compose up -d --build` (rerun) | **Broken** — S3 wiped, catalog 
has stale refs | Works — no-op, state preserved |
   | `docker compose down && up` | Works (fresh start) | Works (fresh start) |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to