(fluss) branch main updated: [doc/build] Fix iceberg quickstart to build on flink standard image (#2555)

mehulbatra Wed, 04 Feb 2026 23:09:22 -0800

This is an automated email from the ASF dual-hosted git repository.

mehulbatra pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fluss.git



The following commit(s) were added to refs/heads/main by this push:
     new f92fa2c1b [doc/build] Fix iceberg quickstart to build on flink 
standard image (#2555)
f92fa2c1b is described below

commit f92fa2c1b76bf5e39b6b51adb27fefe6d52aceae
Author: MehulBatra <[email protected]>
AuthorDate: Thu Feb 5 12:39:11 2026 +0530

    [doc/build] Fix iceberg quickstart to build on flink standard image (#2555)
    
    * fix iceberg quickstart
    
    * address yuxia comments
    
    
    ---------
    
    Co-authored-by: Mehul Batra <[email protected]>
---
 fluss-lake/fluss-lake-iceberg/pom.xml |  12 +++
 website/docs/quickstart/lakehouse.md  | 190 ++++++++++++++++++++++++----------
 2 files changed, 147 insertions(+), 55 deletions(-)

diff --git a/fluss-lake/fluss-lake-iceberg/pom.xml 
b/fluss-lake/fluss-lake-iceberg/pom.xml
index 51b0f7cdd..2c4c79d9f 100644
--- a/fluss-lake/fluss-lake-iceberg/pom.xml
+++ b/fluss-lake/fluss-lake-iceberg/pom.xml
@@ -318,12 +318,24 @@
                                     <include>*:*</include>
                                 </includes>
                             </artifactSet>
+                            <relocations>
+                                <relocation>
+                                    <pattern>com.fasterxml.jackson</pattern>
+                                    
<shadedPattern>org.apache.iceberg.shaded.com.fasterxml.jackson</shadedPattern>
+                                </relocation>
+                                <relocation>
+                                    <pattern>org.apache.parquet</pattern>
+                                    
<shadedPattern>org.apache.iceberg.shaded.org.apache.parquet</shadedPattern>
+                                </relocation>
+                            </relocations>
                             <filters>
                                 <filter>
                                     <artifact>*</artifact>
                                     <excludes>
                                         <exclude>LICENSE</exclude>
                                         <exclude>NOTICE</exclude>
+                                        
<exclude>META-INF/versions/21/**</exclude>
+                                        
<exclude>META-INF/versions/17/**</exclude>
                                     </excludes>
                                 </filter>
                             </filters>
diff --git a/website/docs/quickstart/lakehouse.md 
b/website/docs/quickstart/lakehouse.md
index 5845814b7..91785b436 100644
--- a/website/docs/quickstart/lakehouse.md
+++ b/website/docs/quickstart/lakehouse.md
@@ -103,7 +103,7 @@ services:
     volumes:
       - shared-tmpfs:/tmp/paimon
   #end
-  
+
 volumes:
   shared-tmpfs:
     driver: local
@@ -126,7 +126,7 @@ docker compose up -d
 ```
 This command automatically starts all the containers defined in the Docker 
Compose configuration in detached mode.
 
-Run 
+Run
 ```shell
 docker compose ps
 ```
@@ -155,37 +155,41 @@ mkdir fluss-quickstart-iceberg
 cd fluss-quickstart-iceberg
 ```
 
-2. Create a `lib` directory and download the required Hadoop jar file:
+2. Create directories and download required jars:
 
 ```shell
-mkdir lib
+mkdir -p lib opt
+
+# Flink connectors
+wget -O lib/flink-faker-0.5.3.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+wget -O "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar";
+wget -O lib/iceberg-flink-runtime-1.20-1.10.1.jar 
"https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.10.1/iceberg-flink-runtime-1.20-1.10.1.jar";
+
+# Fluss lake plugin
+wget -O "lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-iceberg/$FLUSS_DOCKER_VERSION$/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar";
+
+# Hadoop filesystem support
 wget -O lib/hadoop-apache-3.3.5-2.jar 
https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar
-```
+wget -O lib/failsafe-3.3.2.jar 
https://repo1.maven.org/maven2/dev/failsafe/failsafe/3.3.2/failsafe-3.3.2.jar
 
-This jar file provides Hadoop 3.3.5 dependencies required for Iceberg's Hadoop 
catalog integration.
+# Tiering service
+wget -O "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar";
+```
 
 :::info
-The `lib` directory serves as a staging area for additional jars needed by the 
Fluss coordinator server. The docker-compose configuration (see step 3) mounts 
this directory and copies all jars to `/opt/fluss/plugins/iceberg/` inside the 
coordinator container at startup.
-
 You can add more jars to this `lib` directory based on your requirements:
 - **Cloud storage support**: For AWS S3 integration with Iceberg, add the 
corresponding Iceberg bundle jars (e.g., `iceberg-aws-bundle`)
 - **Custom Hadoop configurations**: Add jars for specific HDFS distributions 
or custom authentication mechanisms
 - **Other catalog backends**: Add jars needed for alternative Iceberg catalog 
implementations (e.g., Rest, Hive, Glue)
-
-Any jar placed in the `lib` directory will be automatically loaded by the 
Fluss coordinator server, making it available for Iceberg integration.
 :::
 
 3. Create a `docker-compose.yml` file with the following content:
 
-
 ```yaml
 services:
-  zookeeper:
-    restart: always
-    image: zookeeper:3.9.2
-
   coordinator-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
+    command: coordinatorServer
     depends_on:
       - zookeeper
     environment:
@@ -199,9 +203,9 @@ services:
         datalake.iceberg.warehouse: /tmp/iceberg
     volumes:
       - shared-tmpfs:/tmp/iceberg
-      - ./lib:/tmp/lib
-    entrypoint: ["sh", "-c", "cp -v /tmp/lib/*.jar /opt/fluss/plugins/iceberg/ 
&& exec /docker-entrypoint.sh coordinatorServer"]
-
+      - shared-tmpfs:/tmp/fluss
+      - 
./lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar:/opt/fluss/plugins/iceberg/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar
+      - 
./lib/hadoop-apache-3.3.5-2.jar:/opt/fluss/plugins/iceberg/hadoop-apache-3.3.5-2.jar
   tablet-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: tabletServer
@@ -220,33 +224,50 @@ services:
         datalake.iceberg.warehouse: /tmp/iceberg
     volumes:
       - shared-tmpfs:/tmp/iceberg
-
+      - shared-tmpfs:/tmp/fluss
+  zookeeper:
+    restart: always
+    image: zookeeper:3.9.2
   jobmanager:
-    image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+    image: flink:1.20-scala_2.12-java17
     ports:
       - "8083:8081"
-    command: jobmanager
+    entrypoint: ["/bin/bash", "-c"]
+    command: >
+      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+       cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+       /docker-entrypoint.sh jobmanager"
     environment:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
     volumes:
       - shared-tmpfs:/tmp/iceberg
-
+      - shared-tmpfs:/tmp/fluss
+      - ./lib:/tmp/jars
+      - ./opt:/tmp/opt
   taskmanager:
-    image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+    image: flink:1.20-scala_2.12-java17
     depends_on:
       - jobmanager
-    command: taskmanager
+    entrypoint: ["/bin/bash", "-c"]
+    command: >
+      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+       cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+       /docker-entrypoint.sh taskmanager"
     environment:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
         taskmanager.numberOfTaskSlots: 10
         taskmanager.memory.process.size: 2048m
-        taskmanager.memory.framework.off-heap.size: 256m
     volumes:
       - shared-tmpfs:/tmp/iceberg
+      - shared-tmpfs:/tmp/fluss
+      - ./lib:/tmp/jars
+      - ./opt:/tmp/opt
 
 volumes:
   shared-tmpfs:
@@ -260,11 +281,7 @@ The Docker Compose environment consists of the following 
containers:
 - **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a 
`ZooKeeper` server.
 - **Flink Cluster**: a Flink `JobManager` and a Flink `TaskManager` container 
to execute queries.
 
-**Note:** The `apache/fluss-quickstart-flink` image is based on 
[flink:1.20.3-java17](https://hub.docker.com/layers/library/flink/1.20-java17/images/sha256:296c7c23fa40a9a3547771b08fc65e25f06bc4cfd3549eee243c99890778cafc)
 and
-includes the [fluss-flink](engine-flink/getting-started.md), 
[iceberg-flink](https://iceberg.apache.org/docs/latest/flink/) and
-[flink-connector-faker](https://flink-packages.org/packages/flink-faker) to 
simplify this guide.
-
-3. To start all containers, run:
+4. To start all containers, run:
 ```shell
 docker compose up -d
 ```
@@ -272,7 +289,7 @@ This command automatically starts all the containers 
defined in the Docker Compo
 
 Run
 ```shell
-docker container ls -a
+docker compose ps
 ```
 to check whether all containers are running properly.
 
@@ -280,7 +297,6 @@ You can also visit http://localhost:8083/ to see if Flink 
is running normally.
 
 :::note
 - If you want to additionally use an observability stack, follow one of the 
provided quickstart guides [here](maintenance/observability/quickstart.md) and 
then continue with this guide.
-- If you want to run with your own Flink environment, remember to download the 
[fluss-flink connector jar](/downloads), 
[flink-connector-faker](https://github.com/knaufk/flink-faker/releases) and 
[iceberg-flink connector jar](https://iceberg.apache.org/docs/latest/flink/) 
and then put them to `FLINK_HOME/lib/`.
 - All the following commands involving `docker compose` should be executed in 
the created working directory that contains the `docker-compose.yml` file.
 :::
 
@@ -290,6 +306,10 @@ Congratulations, you are all set!
 </Tabs>
 
 ## Enter into SQL-Client
+
+<Tabs groupId="lake-tabs">
+  <TabItem value="paimon" label="Paimon" default>
+
 First, use the following command to enter the Flink SQL CLI Container:
 ```shell
 docker compose exec jobmanager ./sql-client
@@ -311,6 +331,80 @@ SHOW CREATE TABLE source_order;
 SHOW CREATE TABLE source_nation;
 ```
 
+  </TabItem>
+
+  <TabItem value="iceberg" label="Iceberg">
+
+First, use the following command to enter the Flink SQL CLI Container:
+```shell
+docker compose exec jobmanager ./bin/sql-client.sh
+```
+
+To simplify this guide, we will create three temporary tables with `faker` 
connector to generate data:
+
+```sql title="Flink SQL"
+CREATE TEMPORARY TABLE source_order (
+    `order_key` BIGINT,
+    `cust_key` INT,
+    `total_price` DECIMAL(15, 2),
+    `order_date` DATE,
+    `order_priority` STRING,
+    `clerk` STRING
+) WITH (
+  'connector' = 'faker',
+  'rows-per-second' = '10',
+  'number-of-rows' = '10000',
+  'fields.order_key.expression' = '#{number.numberBetween 
''0'',''100000000''}',
+  'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+  'fields.total_price.expression' = '#{number.randomDouble 
''3'',''1'',''1000''}',
+  'fields.order_date.expression' = '#{date.past ''100'' ''DAYS''}',
+  'fields.order_priority.expression' = '#{regexify ''(low|medium|high){1}''}',
+  'fields.clerk.expression' = '#{regexify 
''(Clerk1|Clerk2|Clerk3|Clerk4){1}''}'
+);
+```
+
+```sql title="Flink SQL"
+CREATE TEMPORARY TABLE source_customer (
+    `cust_key` INT,
+    `name` STRING,
+    `phone` STRING,
+    `nation_key` INT NOT NULL,
+    `acctbal` DECIMAL(15, 2),
+    `mktsegment` STRING,
+    PRIMARY KEY (`cust_key`) NOT ENFORCED
+) WITH (
+  'connector' = 'faker',
+  'number-of-rows' = '200',
+  'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+  'fields.name.expression' = '#{funnyName.name}',
+  'fields.nation_key.expression' = '#{number.numberBetween ''1'',''5''}',
+  'fields.phone.expression' = '#{phoneNumber.cellPhone}',
+  'fields.acctbal.expression' = '#{number.randomDouble ''3'',''1'',''1000''}',
+  'fields.mktsegment.expression' = '#{regexify 
''(AUTOMOBILE|BUILDING|FURNITURE|MACHINERY|HOUSEHOLD){1}''}'
+);
+```
+
+```sql title="Flink SQL"
+CREATE TEMPORARY TABLE `source_nation` (
+  `nation_key` INT NOT NULL,
+  `name`       STRING,
+   PRIMARY KEY (`nation_key`) NOT ENFORCED
+) WITH (
+  'connector' = 'faker',
+  'number-of-rows' = '100',
+  'fields.nation_key.expression' = '#{number.numberBetween ''1'',''5''}',
+  'fields.name.expression' = '#{regexify 
''(CANADA|JORDAN|CHINA|UNITED|INDIA){1}''}'
+);
+```
+
+```sql title="Flink SQL"
+-- drop records silently if a null value would have to be inserted into a NOT 
NULL column
+SET 'table.exec.sink.not-null-enforcer'='DROP';
+```
+
+  </TabItem>
+</Tabs>
+
 
 ## Create Fluss Tables
 ### Create Fluss Catalog
@@ -428,8 +522,8 @@ END;
 <Tabs groupId="lake-tabs">
   <TabItem value="paimon" label="Paimon" default>
 
-To integrate with [Apache Paimon](https://paimon.apache.org/), you need to 
start the `Lakehouse Tiering Service`. 
-Open a new terminal, navigate to the `fluss-quickstart-flink` directory, and 
execute the following command within this directory to start the service:
+To integrate with [Apache Paimon](https://paimon.apache.org/), you need to 
start the `Lakehouse Tiering Service`.
+Open a new terminal, navigate to the `fluss-quickstart-paimon` directory, and 
execute the following command within this directory to start the service:
 ```shell
 docker compose exec jobmanager \
     /opt/flink/bin/flink run \
@@ -446,11 +540,11 @@ You should see a Flink Job to tier data from Fluss to 
Paimon running in the [Fli
   <TabItem value="iceberg" label="Iceberg">
 
 To integrate with [Apache Iceberg](https://iceberg.apache.org/), you need to 
start the `Lakehouse Tiering Service`.
-Open a new terminal, navigate to the `fluss-quickstart-flink-iceberg` 
directory, and execute the following command within this directory to start the 
service:
+Open a new terminal, navigate to the `fluss-quickstart-iceberg` directory, and 
execute the following command within this directory to start the service:
 ```shell
 docker compose exec jobmanager \
     /opt/flink/bin/flink run \
-    /opt/flink/opt/fluss-flink-tiering-$FLUSS_VERSION$.jar \
+    /opt/flink/opt/fluss-flink-tiering.jar \
     --fluss.bootstrap.servers coordinator-server:9123 \
     --datalake.format iceberg \
     --datalake.iceberg.type hadoop \
@@ -491,10 +585,6 @@ CREATE TABLE datalake_enriched_orders (
 ```
 
 Next, perform streaming data writing into the **datalake-enabled** table, 
`datalake_enriched_orders`:
-```sql  title="Flink SQL"
--- switch to streaming mode
-SET 'execution.runtime-mode' = 'streaming';
-```
 
 ```sql  title="Flink SQL"
 -- insert tuples into datalake_enriched_orders
@@ -676,6 +766,11 @@ If you wish to query only the data stored in 
Iceberg—offering high-performance
 This approach also enables all the optimizations and features of a Flink 
Iceberg table source, including [system 
table](https://iceberg.apache.org/docs/latest/flink-queries/#inspecting-tables) 
such as `datalake_enriched_orders$lake$snapshots`.
 
 
+```sql  title="Flink SQL"
+-- use tableau result mode
+SET 'sql-client.execution.result-mode' = 'tableau';
+```
+
 ```sql  title="Flink SQL"
 -- switch to batch mode
 SET 'execution.runtime-mode' = 'batch';
@@ -730,21 +825,6 @@ SELECT sum(total_price) as sum_price FROM 
datalake_enriched_orders;
 
 You can execute the real-time analytics query multiple times, and the results 
will vary with each run as new data is continuously written to Fluss in 
real-time.
 
-Finally, you can use the following command to view the files stored in Iceberg:
-```shell
-docker compose exec taskmanager tree /tmp/iceberg/fluss
-```
-
-**Sample Output:**
-```shell
-/tmp/iceberg/fluss
-└── datalake_enriched_orders
-    ├── data
-    │   └── 00000-0-abc123.parquet
-    └── metadata
-        ├── snap-1234567890123456789-1-abc123.avro
-        └── v1.metadata.json
-```
 The files adhere to Iceberg's standard format, enabling seamless querying with 
other engines such as 
[Spark](https://iceberg.apache.org/docs/latest/spark-queries/) and 
[Trino](https://trino.io/docs/current/connector/iceberg.html).
 
   </TabItem>

(fluss) branch main updated: [doc/build] Fix iceberg quickstart to build on flink standard image (#2555)

Reply via email to