(fluss) 01/19: [docs] Update Quickstart to Use S3 (via RustFS) Instead of Local File System (#2569)

jark Thu, 12 Feb 2026 01:53:13 -0800

This is an automated email from the ASF dual-hosted git repository.

jark pushed a commit to branch release-0.9
in repository https://gitbox.apache.org/repos/asf/fluss.git


commit 55e57d3d720b8d67b49a8432334f740b2b9b525a
Author: Keith Lee <[email protected]>
AuthorDate: Wed Feb 11 02:09:10 2026 +0000

    [docs] Update Quickstart to Use S3 (via RustFS) Instead of Local File 
System (#2569)
---
 docker/quickstart-flink/prepare_build.sh |   4 +-
 pom.xml                                  |   2 +-
 website/docs/quickstart/flink.md         |  96 ++++++++--
 website/docs/quickstart/lakehouse.md     | 304 +++++++++++++++++++++++--------
 4 files changed, 318 insertions(+), 88 deletions(-)

diff --git a/docker/quickstart-flink/prepare_build.sh 
b/docker/quickstart-flink/prepare_build.sh
index 3dd9f964a..a2904b05d 100755
--- a/docker/quickstart-flink/prepare_build.sh
+++ b/docker/quickstart-flink/prepare_build.sh
@@ -76,7 +76,7 @@ download_jar() {
     log_info "Downloading $description..."
 
     # Download the file
-    if ! wget -O "$dest_file" "$url"; then
+    if ! curl -fL -o "$dest_file" "$url"; then
         log_error "Failed to download $description from $url"
         return 1
     fi
@@ -258,4 +258,4 @@ show_summary() {
 }
 
 # Run main function
-main "$@"
\ No newline at end of file
+main "$@"
diff --git a/pom.xml b/pom.xml
index e223b86e8..7322422f5 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1139,4 +1139,4 @@
         </pluginManagement>
     </build>
 
-</project>
\ No newline at end of file
+</project>
diff --git a/website/docs/quickstart/flink.md b/website/docs/quickstart/flink.md
index a3dc859a7..b5fd0f93c 100644
--- a/website/docs/quickstart/flink.md
+++ b/website/docs/quickstart/flink.md
@@ -33,7 +33,7 @@ mkdir fluss-quickstart-flink
 cd fluss-quickstart-flink
 ```
 
-2. Create a `lib` directory and download the required jar files. You can 
adjust the Flink version as needed. Please make sure to download the compatible 
versions of [fluss-flink connector jar](/downloads) and 
[flink-connector-faker](https://github.com/knaufk/flink-faker/releases)
+2. Create a `lib` directory and download the required jar files. You can 
adjust the Flink version as needed. Please make sure to download the compatible 
versions of [fluss-flink connector jar](/downloads), [fluss-fs-s3 
jar](/downloads), and 
[flink-connector-faker](https://github.com/knaufk/flink-faker/releases)
 
 ```shell
 export FLINK_VERSION="1.20"
@@ -41,26 +41,58 @@ export FLINK_VERSION="1.20"
 
 ```shell
 mkdir lib
-wget -O lib/flink-faker-0.5.3.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
-wget -O "lib/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-${FLINK_VERSION}/$FLUSS_DOCKER_VERSION$/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o lib/flink-faker-0.5.3.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+curl -fL -o "lib/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-${FLINK_VERSION}/$FLUSS_DOCKER_VERSION$/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o "lib/fluss-fs-s3-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-fs-s3/$FLUSS_DOCKER_VERSION$/fluss-fs-s3-$FLUSS_DOCKER_VERSION$.jar";
 ```
 
 3. Create a `docker-compose.yml` file with the following content:
 
 ```yaml
 services:
+  #begin RustFS (S3-compatible storage)
+  rustfs:
+    image: rustfs/rustfs:latest
+    ports:
+      - "9000:9000"
+      - "9001:9001"
+    environment:
+      - RUSTFS_ACCESS_KEY=rustfsadmin
+      - RUSTFS_SECRET_KEY=rustfsadmin
+      - RUSTFS_CONSOLE_ENABLE=true
+    volumes:
+      - rustfs-data:/data
+    command: /data
+  rustfs-init:
+    image: minio/mc
+    depends_on:
+      - rustfs
+    entrypoint: >
+      /bin/sh -c "
+      until mc alias set rustfs http://rustfs:9000 rustfsadmin rustfsadmin; do
+        echo 'Waiting for RustFS...';
+        sleep 1;
+      done;
+      mc mb --ignore-existing rustfs/fluss;
+      "
+  #end
   #begin Fluss cluster
   coordinator-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: coordinatorServer
     depends_on:
       - zookeeper
+      - rustfs-init
     environment:
       - |
         FLUSS_PROPERTIES=
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://coordinator-server:9123
-        remote.data.dir: /tmp/fluss/remote-data
+        remote.data.dir: s3://fluss/remote-data
+        s3.endpoint: http://rustfs:9000
+        s3.access-key: rustfsadmin
+        s3.secret-key: rustfsadmin
+        s3.path-style-access: true
   tablet-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: tabletServer
@@ -72,8 +104,12 @@ services:
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://tablet-server:9123
         data.dir: /tmp/fluss/data
-        remote.data.dir: /tmp/fluss/remote-data
-        kv.snapshot.interval: 0s
+        remote.data.dir: s3://fluss/remote-data
+        s3.endpoint: http://rustfs:9000
+        s3.access-key: rustfsadmin
+        s3.secret-key: rustfsadmin
+        s3.path-style-access: true
+        kv.snapshot.interval: 60s
   zookeeper:
     restart: always
     image: zookeeper:3.9.2
@@ -112,33 +148,43 @@ services:
       - |
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
-        rest.address: jobmanager    
+        rest.address: jobmanager
     entrypoint: ["sh", "-c", "cp -v /tmp/lib/*.jar /opt/flink/lib && exec 
/docker-entrypoint.sh bin/sql-client.sh"]
     volumes:
       - ./lib:/tmp/lib
   #end
+
+volumes:
+  rustfs-data:
 ```
 
 The Docker Compose environment consists of the following containers:
+- **RustFS:** an S3-compatible object storage for tiered storage. You can 
access the RustFS console at http://localhost:9001 with credentials 
`rustfsadmin/rustfsadmin`. An init container (`rustfs-init`) automatically 
creates the `fluss` bucket on startup.
 - **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a 
`ZooKeeper` server.
+   - Snapshot interval `kv.snapshot.interval` is configured as 60 seconds. You 
may want to configure this differently for production systems
+   - Credentials are configured directly with `s3.access-key` and 
`s3.secret-key`. Production systems should use CredentialsProvider chain 
specific to cloud environments.
 - **Flink Cluster**: a Flink `JobManager`, a Flink `TaskManager`, and a Flink 
SQL client container to execute queries.
 
-3. To start all containers, run:
+:::tip
+[RustFS](https://github.com/rustfs/rustfs) is used as replacement for S3 in 
this quickstart example, for your production setup you may want to configure 
this to use cloud file system. See [here](/maintenance/filesystems/overview.md) 
for information on how to setup cloud file systems
+:::
+
+4. To start all containers, run:
 ```shell
 docker compose up -d
 ```
 This command automatically starts all the containers defined in the Docker 
Compose configuration in detached mode.
 
-Run 
+Run
 ```shell
 docker compose ps
 ```
 to check whether all containers are running properly.
 
-You can also visit http://localhost:8083/ to see if Flink is running normally.
+5. Verify the setup. You can visit http://localhost:8083/ to see if Flink is 
running normally. The S3 bucket for Fluss tiered storage is automatically 
created by the `rustfs-init` service. You can access the RustFS console at 
http://localhost:9001 with credentials `rustfsadmin/rustfsadmin` to view the 
`fluss` bucket.
 
 :::note
-- If you want to additionally use an observability stack, follow one of the 
provided quickstart guides [here](maintenance/observability/quickstart.md) and 
then continue with this guide.
+- If you want to additionally use an observability stack, follow one of the 
provided quickstart guides 
[here](/docs/maintenance/observability/quickstart.md) and then continue with 
this guide.
 - All the following commands involving `docker compose` should be executed in 
the created working directory that contains the `docker-compose.yml` file.
 :::
 
@@ -399,6 +445,34 @@ The following SQL query should return an empty result.
 SELECT * FROM fluss_customer WHERE `cust_key` = 1;
 ```
 
+### Quitting Sql Client
+
+The following command allows you to quit Flink SQL Client.
+```sql title="Flink SQL"
+quit;
+```
+
+### Remote Storage
+
+Finally, you can use the following command to view the Primary Key Table 
snapshot files stored on RustFS:
+
+```shell
+docker run --rm --net=host \
+-e MC_HOST_rustfs=http://rustfsadmin:rustfsadmin@localhost:9000 \
+minio/mc ls --recursive rustfs/fluss/                
+```
+
+Sample output: 
+```shell
+[2026-02-03 20:28:59 UTC]  26KiB STANDARD 
remote-data/kv/fluss/enriched_orders-3/0/shared/4f675202-e560-4b8e-9af4-08e9769b4797
+[2026-02-03 20:27:59 UTC]  11KiB STANDARD 
remote-data/kv/fluss/enriched_orders-3/0/shared/87447c34-81d0-4be5-b4c8-abcea5ce68e9
+[2026-02-03 20:28:59 UTC]     0B STANDARD 
remote-data/kv/fluss/enriched_orders-3/0/snap-0/
+[2026-02-03 20:28:59 UTC] 1.1KiB STANDARD 
remote-data/kv/fluss/enriched_orders-3/0/snap-1/_METADATA
+[2026-02-03 20:28:59 UTC]   211B STANDARD 
remote-data/kv/fluss/enriched_orders-3/0/snap-1/aaffa8fc-ddb3-4754-938a-45e28df6d975
+[2026-02-03 20:28:59 UTC]    16B STANDARD 
remote-data/kv/fluss/enriched_orders-3/0/snap-1/d3c18e43-11ee-4e39-912d-087ca01de0e8
+[2026-02-03 20:28:59 UTC] 6.2KiB STANDARD 
remote-data/kv/fluss/enriched_orders-3/0/snap-1/ea2f2097-aa9a-4c2a-9e72-530218cd551c
+```
+
 ## Clean up
 After finishing the tutorial, run `exit` to exit Flink SQL CLI Container and 
then run 
 ```shell
diff --git a/website/docs/quickstart/lakehouse.md 
b/website/docs/quickstart/lakehouse.md
index f99175fc2..5a848c502 100644
--- a/website/docs/quickstart/lakehouse.md
+++ b/website/docs/quickstart/lakehouse.md
@@ -38,26 +38,28 @@ cd fluss-quickstart-paimon
 mkdir -p lib opt
 
 # Flink connectors
-wget -O lib/flink-faker-0.5.3.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
-wget -O "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar";
-wget -O "lib/paimon-flink-1.20-$PAIMON_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/paimon/paimon-flink-1.20/$PAIMON_VERSION$/paimon-flink-1.20-$PAIMON_VERSION$.jar";
+curl -fL -o lib/flink-faker-0.5.3.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+curl -fL -o "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o "lib/paimon-flink-1.20-$PAIMON_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/paimon/paimon-flink-1.20/$PAIMON_VERSION$/paimon-flink-1.20-$PAIMON_VERSION$.jar";
 
 # Fluss lake plugin
-wget -O "lib/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_DOCKER_VERSION$/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o "lib/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_DOCKER_VERSION$/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar";
 
 # Paimon bundle jar
-wget -O "lib/paimon-bundle-$PAIMON_VERSION$.jar" 
"https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar";
+curl -fL -o "lib/paimon-bundle-$PAIMON_VERSION$.jar" 
"https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar";
 
 # Hadoop bundle jar
-wget -O lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
+curl -fL -o lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar 
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
+
+# AWS S3 support
+curl -fL -o "lib/paimon-s3-$PAIMON_VERSION$.jar" 
"https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-s3/$PAIMON_VERSION$/paimon-s3-$PAIMON_VERSION$.jar";
 
 # Tiering service
-wget -O "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar";
 ```
 
 :::info
 You can add more jars to this `lib` directory based on your requirements:
-- **Cloud storage support**: For AWS S3 integration with Paimon, add the 
corresponding 
[paimon-s3](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-s3/$PAIMON_VERSION$/paimon-s3-$PAIMON_VERSION$.jar)
 - **Other catalog backends**: Add jars needed for alternative Paimon catalog 
implementations (e.g., Hive, JDBC)
   :::
 
@@ -65,23 +67,59 @@ You can add more jars to this `lib` directory based on your 
requirements:
 
 ```yaml
 services:
+  #begin RustFS (S3-compatible storage)
+  rustfs:
+    image: rustfs/rustfs:latest
+    ports:
+      - "9000:9000"
+      - "9001:9001"
+    environment:
+      - RUSTFS_ACCESS_KEY=rustfsadmin
+      - RUSTFS_SECRET_KEY=rustfsadmin
+      - RUSTFS_CONSOLE_ENABLE=true
+    volumes:
+      - rustfs-data:/data
+    command: /data
+  rustfs-init:
+    image: minio/mc
+    depends_on:
+      - rustfs
+    entrypoint: >
+      /bin/sh -c "
+      until mc alias set rustfs http://rustfs:9000 rustfsadmin rustfsadmin; do
+        echo 'Waiting for RustFS...';
+        sleep 1;
+      done;
+      mc mb --ignore-existing rustfs/fluss;
+      "
+  #end
   coordinator-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: coordinatorServer
     depends_on:
-      - zookeeper
+      zookeeper:
+        condition: service_started
+      rustfs-init:
+        condition: service_completed_successfully
     environment:
       - |
         FLUSS_PROPERTIES=
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://coordinator-server:9123
-        remote.data.dir: /tmp/fluss/remote-data
+        remote.data.dir: s3://fluss/remote-data
+        s3.endpoint: http://rustfs:9000
+        s3.access-key: rustfsadmin
+        s3.secret-key: rustfsadmin
+        s3.path.style.access: true
         datalake.format: paimon
         datalake.paimon.metastore: filesystem
-        datalake.paimon.warehouse: /tmp/paimon
+        datalake.paimon.warehouse: s3://fluss/paimon
+        datalake.paimon.s3.endpoint: http://rustfs:9000
+        datalake.paimon.s3.access-key: rustfsadmin
+        datalake.paimon.s3.secret-key: rustfsadmin
+        datalake.paimon.s3.path.style.access: true
     volumes:
-      - shared-tmpfs:/tmp/paimon
-      - shared-tmpfs:/tmp/fluss
+      - 
./lib/paimon-s3-$PAIMON_VERSION$.jar:/opt/fluss/plugins/paimon/paimon-s3-$PAIMON_VERSION$.jar
   tablet-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: tabletServer
@@ -93,14 +131,21 @@ services:
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://tablet-server:9123
         data.dir: /tmp/fluss/data
-        remote.data.dir: /tmp/fluss/remote-data
+        remote.data.dir: s3://fluss/remote-data
+        s3.endpoint: http://rustfs:9000
+        s3.access-key: rustfsadmin
+        s3.secret-key: rustfsadmin
+        s3.path.style.access: true
         kv.snapshot.interval: 0s
         datalake.format: paimon
         datalake.paimon.metastore: filesystem
-        datalake.paimon.warehouse: /tmp/paimon
+        datalake.paimon.warehouse: s3://fluss/paimon
+        datalake.paimon.s3.endpoint: http://rustfs:9000
+        datalake.paimon.s3.access-key: rustfsadmin
+        datalake.paimon.s3.secret-key: rustfsadmin
+        datalake.paimon.s3.path.style.access: true
     volumes:
-      - shared-tmpfs:/tmp/paimon
-      - shared-tmpfs:/tmp/fluss
+      - 
./lib/paimon-s3-$PAIMON_VERSION$.jar:/opt/fluss/plugins/paimon/paimon-s3-$PAIMON_VERSION$.jar
   zookeeper:
     restart: always
     image: zookeeper:3.9.2
@@ -110,8 +155,7 @@ services:
       - "8083:8081"
     entrypoint: ["/bin/bash", "-c"]
     command: >
-      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
-       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+      "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
        cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
        /docker-entrypoint.sh jobmanager"
     environment:
@@ -119,8 +163,6 @@ services:
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
     volumes:
-      - shared-tmpfs:/tmp/paimon
-      - shared-tmpfs:/tmp/fluss
       - ./lib:/tmp/jars
       - ./opt:/tmp/opt
   taskmanager:
@@ -129,8 +171,7 @@ services:
       - jobmanager
     entrypoint: ["/bin/bash", "-c"]
     command: >
-      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
-       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+      "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
        cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
        /docker-entrypoint.sh taskmanager"
     environment:
@@ -139,23 +180,24 @@ services:
         jobmanager.rpc.address: jobmanager
         taskmanager.numberOfTaskSlots: 10
         taskmanager.memory.process.size: 2048m
+        taskmanager.memory.task.off-heap.size: 128m
     volumes:
-      - shared-tmpfs:/tmp/paimon
-      - shared-tmpfs:/tmp/fluss
       - ./lib:/tmp/jars
       - ./opt:/tmp/opt
 
 volumes:
-  shared-tmpfs:
-    driver: local
-    driver_opts:
-      type: "tmpfs"
-      device: "tmpfs"
+  rustfs-data:
 ```
 
 The Docker Compose environment consists of the following containers:
 - **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a 
`ZooKeeper` server.
 - **Flink Cluster**: a Flink `JobManager` and a Flink `TaskManager` container 
to execute queries.
+- **RustFS**: an S3-compatible storage system used both as Fluss remote 
storage and Paimon's filesystem warehouse.
+
+
+:::tip
+[RustFS](https://github.com/rustfs/rustfs) is used as replacement for S3 in 
this quickstart example, for your production setup you may want to configure 
this to use cloud file system. See [here](/maintenance/filesystems/overview.md) 
for information on how to setup cloud file systems
+:::
 
 4. To start all containers, run:
 ```shell
@@ -198,25 +240,30 @@ cd fluss-quickstart-iceberg
 mkdir -p lib opt
 
 # Flink connectors
-wget -O lib/flink-faker-0.5.3.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
-wget -O "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar";
-wget -O lib/iceberg-flink-runtime-1.20-1.10.1.jar 
"https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.10.1/iceberg-flink-runtime-1.20-1.10.1.jar";
+curl -fL -o lib/flink-faker-0.5.3.jar 
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+curl -fL -o "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o lib/iceberg-flink-runtime-1.20-1.10.1.jar 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.10.1/iceberg-flink-runtime-1.20-1.10.1.jar
 
 # Fluss lake plugin
-wget -O "lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-iceberg/$FLUSS_DOCKER_VERSION$/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o "lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-iceberg/$FLUSS_DOCKER_VERSION$/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar";
+
+# Iceberg AWS support (S3FileIO + AWS SDK)
+curl -fL -o lib/iceberg-aws-1.10.1.jar 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws/1.10.1/iceberg-aws-1.10.1.jar
+curl -fL -o lib/iceberg-aws-bundle-1.10.1.jar 
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.10.1/iceberg-aws-bundle-1.10.1.jar
 
-# Hadoop filesystem support
-wget -O lib/hadoop-apache-3.3.5-2.jar 
https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar
-wget -O lib/failsafe-3.3.2.jar 
https://repo1.maven.org/maven2/dev/failsafe/failsafe/3.3.2/failsafe-3.3.2.jar
+# JDBC catalog driver
+curl -fL -o lib/postgresql-42.7.4.jar 
https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.4/postgresql-42.7.4.jar
+
+# Hadoop client (required by Iceberg's Flink integration)
+curl -fL -o lib/hadoop-client-api-3.3.5.jar 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client-api/3.3.5/hadoop-client-api-3.3.5.jar
+curl -fL -o lib/hadoop-client-runtime-3.3.5.jar 
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client-runtime/3.3.5/hadoop-client-runtime-3.3.5.jar
 
 # Tiering service
-wget -O "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar";
+curl -fL -o "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar" 
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar";
 ```
 
 :::info
 You can add more jars to this `lib` directory based on your requirements:
-- **Cloud storage support**: For AWS S3 integration with Iceberg, add the 
corresponding Iceberg bundle jars (e.g., `iceberg-aws-bundle`)
-- **Custom Hadoop configurations**: Add jars for specific HDFS distributions 
or custom authentication mechanisms
 - **Other catalog backends**: Add jars needed for alternative Iceberg catalog 
implementations (e.g., Rest, Hive, Glue)
 :::
 
@@ -224,25 +271,81 @@ You can add more jars to this `lib` directory based on 
your requirements:
 
 ```yaml
 services:
+  #begin RustFS (S3-compatible storage)
+  rustfs:
+    image: rustfs/rustfs:latest
+    ports:
+      - "9000:9000"
+      - "9001:9001"
+    environment:
+      - RUSTFS_ACCESS_KEY=rustfsadmin
+      - RUSTFS_SECRET_KEY=rustfsadmin
+      - RUSTFS_CONSOLE_ENABLE=true
+    volumes:
+      - rustfs-data:/data
+    command: /data
+  rustfs-init:
+    image: minio/mc
+    depends_on:
+      - rustfs
+    entrypoint: >
+      /bin/sh -c "
+      until mc alias set rustfs http://rustfs:9000 rustfsadmin rustfsadmin; do
+        echo 'Waiting for RustFS...';
+        sleep 1;
+      done;
+      mc mb --ignore-existing rustfs/fluss;
+      "
+  #end
+  postgres:
+    image: postgres:17
+    environment:
+      - POSTGRES_USER=iceberg
+      - POSTGRES_PASSWORD=iceberg
+      - POSTGRES_DB=iceberg
+    healthcheck:
+      test: ["CMD-SHELL", "pg_isready -U iceberg"]
+      interval: 3s
+      timeout: 3s
+      retries: 5
   coordinator-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: coordinatorServer
     depends_on:
-      - zookeeper
+      postgres:
+        condition: service_healthy
+      zookeeper:
+        condition: service_started
+      rustfs-init:
+        condition: service_completed_successfully
     environment:
       - |
         FLUSS_PROPERTIES=
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://coordinator-server:9123
-        remote.data.dir: /tmp/fluss/remote-data
+        remote.data.dir: s3://fluss/remote-data
+        s3.endpoint: http://rustfs:9000
+        s3.access-key: rustfsadmin
+        s3.secret-key: rustfsadmin
+        s3.path.style.access: true
         datalake.format: iceberg
-        datalake.iceberg.type: hadoop
-        datalake.iceberg.warehouse: /tmp/iceberg
+        datalake.iceberg.catalog-impl: org.apache.iceberg.jdbc.JdbcCatalog
+        datalake.iceberg.name: fluss_catalog
+        datalake.iceberg.uri: jdbc:postgresql://postgres:5432/iceberg
+        datalake.iceberg.jdbc.user: iceberg
+        datalake.iceberg.jdbc.password: iceberg
+        datalake.iceberg.warehouse: s3://fluss/iceberg
+        datalake.iceberg.io-impl: org.apache.iceberg.aws.s3.S3FileIO
+        datalake.iceberg.s3.endpoint: http://rustfs:9000
+        datalake.iceberg.s3.access-key-id: rustfsadmin
+        datalake.iceberg.s3.secret-access-key: rustfsadmin
+        datalake.iceberg.s3.path-style-access: true
+        datalake.iceberg.client.region: us-east-1
     volumes:
-      - shared-tmpfs:/tmp/iceberg
-      - shared-tmpfs:/tmp/fluss
       - 
./lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar:/opt/fluss/plugins/iceberg/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar
-      - 
./lib/hadoop-apache-3.3.5-2.jar:/opt/fluss/plugins/iceberg/hadoop-apache-3.3.5-2.jar
+      - 
./lib/iceberg-aws-1.10.1.jar:/opt/fluss/plugins/iceberg/iceberg-aws-1.10.1.jar
+      - 
./lib/iceberg-aws-bundle-1.10.1.jar:/opt/fluss/plugins/iceberg/iceberg-aws-bundle-1.10.1.jar
+      - 
./lib/postgresql-42.7.4.jar:/opt/fluss/plugins/iceberg/postgresql-42.7.4.jar
   tablet-server:
     image: apache/fluss:$FLUSS_DOCKER_VERSION$
     command: tabletServer
@@ -254,14 +357,25 @@ services:
         zookeeper.address: zookeeper:2181
         bind.listeners: FLUSS://tablet-server:9123
         data.dir: /tmp/fluss/data
-        remote.data.dir: /tmp/fluss/remote-data
         kv.snapshot.interval: 0s
+        remote.data.dir: s3://fluss/remote-data
+        s3.endpoint: http://rustfs:9000
+        s3.access-key: rustfsadmin
+        s3.secret-key: rustfsadmin
+        s3.path.style.access: true
         datalake.format: iceberg
-        datalake.iceberg.type: hadoop
-        datalake.iceberg.warehouse: /tmp/iceberg
-    volumes:
-      - shared-tmpfs:/tmp/iceberg
-      - shared-tmpfs:/tmp/fluss
+        datalake.iceberg.catalog-impl: org.apache.iceberg.jdbc.JdbcCatalog
+        datalake.iceberg.name: fluss_catalog
+        datalake.iceberg.uri: jdbc:postgresql://postgres:5432/iceberg
+        datalake.iceberg.jdbc.user: iceberg
+        datalake.iceberg.jdbc.password: iceberg
+        datalake.iceberg.warehouse: s3://fluss/iceberg
+        datalake.iceberg.io-impl: org.apache.iceberg.aws.s3.S3FileIO
+        datalake.iceberg.s3.endpoint: http://rustfs:9000
+        datalake.iceberg.s3.access-key-id: rustfsadmin
+        datalake.iceberg.s3.secret-access-key: rustfsadmin
+        datalake.iceberg.s3.path-style-access: true
+        datalake.iceberg.client.region: us-east-1
   zookeeper:
     restart: always
     image: zookeeper:3.9.2
@@ -271,8 +385,7 @@ services:
       - "8083:8081"
     entrypoint: ["/bin/bash", "-c"]
     command: >
-      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
-       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+      "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
        cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
        /docker-entrypoint.sh jobmanager"
     environment:
@@ -280,8 +393,6 @@ services:
         FLINK_PROPERTIES=
         jobmanager.rpc.address: jobmanager
     volumes:
-      - shared-tmpfs:/tmp/iceberg
-      - shared-tmpfs:/tmp/fluss
       - ./lib:/tmp/jars
       - ./opt:/tmp/opt
   taskmanager:
@@ -290,8 +401,7 @@ services:
       - jobmanager
     entrypoint: ["/bin/bash", "-c"]
     command: >
-      "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
-       cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+      "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
        cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
        /docker-entrypoint.sh taskmanager"
     environment:
@@ -300,23 +410,24 @@ services:
         jobmanager.rpc.address: jobmanager
         taskmanager.numberOfTaskSlots: 10
         taskmanager.memory.process.size: 2048m
+        taskmanager.memory.task.off-heap.size: 128m
     volumes:
-      - shared-tmpfs:/tmp/iceberg
-      - shared-tmpfs:/tmp/fluss
       - ./lib:/tmp/jars
       - ./opt:/tmp/opt
 
 volumes:
-  shared-tmpfs:
-    driver: local
-    driver_opts:
-      type: "tmpfs"
-      device: "tmpfs"
+  rustfs-data:
 ```
 
 The Docker Compose environment consists of the following containers:
 - **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a 
`ZooKeeper` server.
 - **Flink Cluster**: a Flink `JobManager` and a Flink `TaskManager` container 
to execute queries.
+- **PostgreSQL**: stores Iceberg catalog metadata (used by `JdbcCatalog`).
+- **RustFS**: an S3-compatible storage system used both as Fluss remote 
storage and Iceberg's filesystem warehouse.
+
+:::tip
+[RustFS](https://github.com/rustfs/rustfs) is used as replacement for S3 in 
this quickstart example, for your production setup you may want to configure 
this to use cloud file system. See [here](/maintenance/filesystems/overview.md) 
for information on how to setup cloud file systems
+:::
 
 4. To start all containers, run:
 ```shell
@@ -492,12 +603,33 @@ SET 'table.exec.sink.not-null-enforcer'='DROP';
 ## Create Fluss Tables
 ### Create Fluss Catalog
 Use the following SQL to create a Fluss catalog:
+
+<Tabs groupId="lake-tabs">
+  <TabItem value="paimon" label="Paimon" default>
+
+```sql title="Flink SQL"
+CREATE CATALOG fluss_catalog WITH (
+    'type' = 'fluss',
+    'bootstrap.servers' = 'coordinator-server:9123',
+    'paimon.s3.access-key' = 'rustfsadmin',
+    'paimon.s3.secret-key' = 'rustfsadmin'
+);
+```
+  </TabItem>
+
+  <TabItem value="iceberg" label="Iceberg">
+
 ```sql title="Flink SQL"
 CREATE CATALOG fluss_catalog WITH (
     'type' = 'fluss',
-    'bootstrap.servers' = 'coordinator-server:9123'
+    'bootstrap.servers' = 'coordinator-server:9123',
+    'iceberg.jdbc.password' = 'iceberg',
+    'iceberg.s3.access-key-id' = 'rustfsadmin',
+    'iceberg.s3.secret-access-key' = 'rustfsadmin'
 );
 ```
+  </TabItem>
+</Tabs>
 
 ```sql title="Flink SQL"
 USE CATALOG fluss_catalog;
@@ -614,7 +746,11 @@ docker compose exec jobmanager \
     --fluss.bootstrap.servers coordinator-server:9123 \
     --datalake.format paimon \
     --datalake.paimon.metastore filesystem \
-    --datalake.paimon.warehouse /tmp/paimon
+    --datalake.paimon.warehouse s3://fluss/paimon \
+    --datalake.paimon.s3.endpoint http://rustfs:9000 \
+    --datalake.paimon.s3.access.key rustfsadmin \
+    --datalake.paimon.s3.secret.key rustfsadmin \
+    --datalake.paimon.s3.path.style.access true
 ```
 You should see a Flink Job to tier data from Fluss to Paimon running in the 
[Flink Web UI](http://localhost:8083/).
 
@@ -627,11 +763,21 @@ Open a new terminal, navigate to the 
`fluss-quickstart-iceberg` directory, and e
 ```shell
 docker compose exec jobmanager \
     /opt/flink/bin/flink run \
-    /opt/flink/opt/fluss-flink-tiering.jar \
+    /opt/flink/opt/fluss-flink-tiering-$FLUSS_VERSION$.jar \
     --fluss.bootstrap.servers coordinator-server:9123 \
     --datalake.format iceberg \
-    --datalake.iceberg.type hadoop \
-    --datalake.iceberg.warehouse /tmp/iceberg
+    --datalake.iceberg.catalog-impl org.apache.iceberg.jdbc.JdbcCatalog \
+    --datalake.iceberg.name fluss_catalog \
+    --datalake.iceberg.uri "jdbc:postgresql://postgres:5432/iceberg" \
+    --datalake.iceberg.jdbc.user iceberg \
+    --datalake.iceberg.jdbc.password iceberg \
+    --datalake.iceberg.warehouse "s3://fluss/iceberg" \
+    --datalake.iceberg.io-impl org.apache.iceberg.aws.s3.S3FileIO \
+    --datalake.iceberg.s3.endpoint "http://rustfs:9000"; \
+    --datalake.iceberg.s3.access-key-id rustfsadmin \
+    --datalake.iceberg.s3.secret-access-key rustfsadmin \
+    --datalake.iceberg.s3.path-style-access true \
+    --datalake.iceberg.client.region us-east-1
 ```
 You should see a Flink Job to tier data from Fluss to Iceberg running in the 
[Flink Web UI](http://localhost:8083/).
 
@@ -888,9 +1034,19 @@ The files adhere to Iceberg's standard format, enabling 
seamless querying with o
   </TabItem>
 </Tabs>
 
+### Quitting Sql Client
+
+The following command allows you to quit Flink SQL Client.
+```sql title="Flink SQL"
+quit;
+```
+
+### Tiered Storage
+
+You can visit http://localhost:9001/ and sign in with `rustfsadmin` / 
`rustfsadmin` to view the files stored on tiered storage.
+
 ## Clean up
-After finishing the tutorial, run `exit` to exit Flink SQL CLI Container and 
then run 
+Run the following to stop all containers.
 ```shell
 docker compose down -v
-```
-to stop all containers.
\ No newline at end of file
+```
\ No newline at end of file

(fluss) 01/19: [docs] Update Quickstart to Use S3 (via RustFS) Instead of Local File System (#2569)

Reply via email to