This is an automated email from the ASF dual-hosted git repository.
jark pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fluss.git
The following commit(s) were added to refs/heads/main by this push:
new 9635f0386 [docs] Update Quickstart to Use S3 (via RustFS) Instead of
Local File System (#2569)
9635f0386 is described below
commit 9635f03869bce4e009c95febf5a4492bbd586427
Author: Keith Lee <[email protected]>
AuthorDate: Wed Feb 11 02:09:10 2026 +0000
[docs] Update Quickstart to Use S3 (via RustFS) Instead of Local File
System (#2569)
---
docker/quickstart-flink/prepare_build.sh | 4 +-
pom.xml | 2 +-
website/docs/quickstart/flink.md | 96 ++++++++--
website/docs/quickstart/lakehouse.md | 304 +++++++++++++++++++++++--------
4 files changed, 318 insertions(+), 88 deletions(-)
diff --git a/docker/quickstart-flink/prepare_build.sh
b/docker/quickstart-flink/prepare_build.sh
index 3dd9f964a..a2904b05d 100755
--- a/docker/quickstart-flink/prepare_build.sh
+++ b/docker/quickstart-flink/prepare_build.sh
@@ -76,7 +76,7 @@ download_jar() {
log_info "Downloading $description..."
# Download the file
- if ! wget -O "$dest_file" "$url"; then
+ if ! curl -fL -o "$dest_file" "$url"; then
log_error "Failed to download $description from $url"
return 1
fi
@@ -258,4 +258,4 @@ show_summary() {
}
# Run main function
-main "$@"
\ No newline at end of file
+main "$@"
diff --git a/pom.xml b/pom.xml
index 72baefc91..f68629306 100644
--- a/pom.xml
+++ b/pom.xml
@@ -1139,4 +1139,4 @@
</pluginManagement>
</build>
-</project>
\ No newline at end of file
+</project>
diff --git a/website/docs/quickstart/flink.md b/website/docs/quickstart/flink.md
index a3dc859a7..b5fd0f93c 100644
--- a/website/docs/quickstart/flink.md
+++ b/website/docs/quickstart/flink.md
@@ -33,7 +33,7 @@ mkdir fluss-quickstart-flink
cd fluss-quickstart-flink
```
-2. Create a `lib` directory and download the required jar files. You can
adjust the Flink version as needed. Please make sure to download the compatible
versions of [fluss-flink connector jar](/downloads) and
[flink-connector-faker](https://github.com/knaufk/flink-faker/releases)
+2. Create a `lib` directory and download the required jar files. You can
adjust the Flink version as needed. Please make sure to download the compatible
versions of [fluss-flink connector jar](/downloads), [fluss-fs-s3
jar](/downloads), and
[flink-connector-faker](https://github.com/knaufk/flink-faker/releases)
```shell
export FLINK_VERSION="1.20"
@@ -41,26 +41,58 @@ export FLINK_VERSION="1.20"
```shell
mkdir lib
-wget -O lib/flink-faker-0.5.3.jar
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
-wget -O "lib/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-${FLINK_VERSION}/$FLUSS_DOCKER_VERSION$/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o lib/flink-faker-0.5.3.jar
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+curl -fL -o "lib/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-${FLINK_VERSION}/$FLUSS_DOCKER_VERSION$/fluss-flink-${FLINK_VERSION}-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o "lib/fluss-fs-s3-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-fs-s3/$FLUSS_DOCKER_VERSION$/fluss-fs-s3-$FLUSS_DOCKER_VERSION$.jar"
```
3. Create a `docker-compose.yml` file with the following content:
```yaml
services:
+ #begin RustFS (S3-compatible storage)
+ rustfs:
+ image: rustfs/rustfs:latest
+ ports:
+ - "9000:9000"
+ - "9001:9001"
+ environment:
+ - RUSTFS_ACCESS_KEY=rustfsadmin
+ - RUSTFS_SECRET_KEY=rustfsadmin
+ - RUSTFS_CONSOLE_ENABLE=true
+ volumes:
+ - rustfs-data:/data
+ command: /data
+ rustfs-init:
+ image: minio/mc
+ depends_on:
+ - rustfs
+ entrypoint: >
+ /bin/sh -c "
+ until mc alias set rustfs http://rustfs:9000 rustfsadmin rustfsadmin; do
+ echo 'Waiting for RustFS...';
+ sleep 1;
+ done;
+ mc mb --ignore-existing rustfs/fluss;
+ "
+ #end
#begin Fluss cluster
coordinator-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: coordinatorServer
depends_on:
- zookeeper
+ - rustfs-init
environment:
- |
FLUSS_PROPERTIES=
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://coordinator-server:9123
- remote.data.dir: /tmp/fluss/remote-data
+ remote.data.dir: s3://fluss/remote-data
+ s3.endpoint: http://rustfs:9000
+ s3.access-key: rustfsadmin
+ s3.secret-key: rustfsadmin
+ s3.path-style-access: true
tablet-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: tabletServer
@@ -72,8 +104,12 @@ services:
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://tablet-server:9123
data.dir: /tmp/fluss/data
- remote.data.dir: /tmp/fluss/remote-data
- kv.snapshot.interval: 0s
+ remote.data.dir: s3://fluss/remote-data
+ s3.endpoint: http://rustfs:9000
+ s3.access-key: rustfsadmin
+ s3.secret-key: rustfsadmin
+ s3.path-style-access: true
+ kv.snapshot.interval: 60s
zookeeper:
restart: always
image: zookeeper:3.9.2
@@ -112,33 +148,43 @@ services:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
- rest.address: jobmanager
+ rest.address: jobmanager
entrypoint: ["sh", "-c", "cp -v /tmp/lib/*.jar /opt/flink/lib && exec
/docker-entrypoint.sh bin/sql-client.sh"]
volumes:
- ./lib:/tmp/lib
#end
+
+volumes:
+ rustfs-data:
```
The Docker Compose environment consists of the following containers:
+- **RustFS:** an S3-compatible object storage for tiered storage. You can
access the RustFS console at http://localhost:9001 with credentials
`rustfsadmin/rustfsadmin`. An init container (`rustfs-init`) automatically
creates the `fluss` bucket on startup.
- **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a
`ZooKeeper` server.
+ - Snapshot interval `kv.snapshot.interval` is configured as 60 seconds. You
may want to configure this differently for production systems
+ - Credentials are configured directly with `s3.access-key` and
`s3.secret-key`. Production systems should use CredentialsProvider chain
specific to cloud environments.
- **Flink Cluster**: a Flink `JobManager`, a Flink `TaskManager`, and a Flink
SQL client container to execute queries.
-3. To start all containers, run:
+:::tip
+[RustFS](https://github.com/rustfs/rustfs) is used as replacement for S3 in
this quickstart example, for your production setup you may want to configure
this to use cloud file system. See [here](/maintenance/filesystems/overview.md)
for information on how to setup cloud file systems
+:::
+
+4. To start all containers, run:
```shell
docker compose up -d
```
This command automatically starts all the containers defined in the Docker
Compose configuration in detached mode.
-Run
+Run
```shell
docker compose ps
```
to check whether all containers are running properly.
-You can also visit http://localhost:8083/ to see if Flink is running normally.
+5. Verify the setup. You can visit http://localhost:8083/ to see if Flink is
running normally. The S3 bucket for Fluss tiered storage is automatically
created by the `rustfs-init` service. You can access the RustFS console at
http://localhost:9001 with credentials `rustfsadmin/rustfsadmin` to view the
`fluss` bucket.
:::note
-- If you want to additionally use an observability stack, follow one of the
provided quickstart guides [here](maintenance/observability/quickstart.md) and
then continue with this guide.
+- If you want to additionally use an observability stack, follow one of the
provided quickstart guides
[here](/docs/maintenance/observability/quickstart.md) and then continue with
this guide.
- All the following commands involving `docker compose` should be executed in
the created working directory that contains the `docker-compose.yml` file.
:::
@@ -399,6 +445,34 @@ The following SQL query should return an empty result.
SELECT * FROM fluss_customer WHERE `cust_key` = 1;
```
+### Quitting Sql Client
+
+The following command allows you to quit Flink SQL Client.
+```sql title="Flink SQL"
+quit;
+```
+
+### Remote Storage
+
+Finally, you can use the following command to view the Primary Key Table
snapshot files stored on RustFS:
+
+```shell
+docker run --rm --net=host \
+-e MC_HOST_rustfs=http://rustfsadmin:rustfsadmin@localhost:9000 \
+minio/mc ls --recursive rustfs/fluss/
+```
+
+Sample output:
+```shell
+[2026-02-03 20:28:59 UTC] 26KiB STANDARD
remote-data/kv/fluss/enriched_orders-3/0/shared/4f675202-e560-4b8e-9af4-08e9769b4797
+[2026-02-03 20:27:59 UTC] 11KiB STANDARD
remote-data/kv/fluss/enriched_orders-3/0/shared/87447c34-81d0-4be5-b4c8-abcea5ce68e9
+[2026-02-03 20:28:59 UTC] 0B STANDARD
remote-data/kv/fluss/enriched_orders-3/0/snap-0/
+[2026-02-03 20:28:59 UTC] 1.1KiB STANDARD
remote-data/kv/fluss/enriched_orders-3/0/snap-1/_METADATA
+[2026-02-03 20:28:59 UTC] 211B STANDARD
remote-data/kv/fluss/enriched_orders-3/0/snap-1/aaffa8fc-ddb3-4754-938a-45e28df6d975
+[2026-02-03 20:28:59 UTC] 16B STANDARD
remote-data/kv/fluss/enriched_orders-3/0/snap-1/d3c18e43-11ee-4e39-912d-087ca01de0e8
+[2026-02-03 20:28:59 UTC] 6.2KiB STANDARD
remote-data/kv/fluss/enriched_orders-3/0/snap-1/ea2f2097-aa9a-4c2a-9e72-530218cd551c
+```
+
## Clean up
After finishing the tutorial, run `exit` to exit Flink SQL CLI Container and
then run
```shell
diff --git a/website/docs/quickstart/lakehouse.md
b/website/docs/quickstart/lakehouse.md
index f99175fc2..5a848c502 100644
--- a/website/docs/quickstart/lakehouse.md
+++ b/website/docs/quickstart/lakehouse.md
@@ -38,26 +38,28 @@ cd fluss-quickstart-paimon
mkdir -p lib opt
# Flink connectors
-wget -O lib/flink-faker-0.5.3.jar
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
-wget -O "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
-wget -O "lib/paimon-flink-1.20-$PAIMON_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/paimon/paimon-flink-1.20/$PAIMON_VERSION$/paimon-flink-1.20-$PAIMON_VERSION$.jar"
+curl -fL -o lib/flink-faker-0.5.3.jar
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+curl -fL -o "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o "lib/paimon-flink-1.20-$PAIMON_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/paimon/paimon-flink-1.20/$PAIMON_VERSION$/paimon-flink-1.20-$PAIMON_VERSION$.jar"
# Fluss lake plugin
-wget -O "lib/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_DOCKER_VERSION$/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o "lib/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_DOCKER_VERSION$/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar"
# Paimon bundle jar
-wget -O "lib/paimon-bundle-$PAIMON_VERSION$.jar"
"https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar"
+curl -fL -o "lib/paimon-bundle-$PAIMON_VERSION$.jar"
"https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar"
# Hadoop bundle jar
-wget -O lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
+curl -fL -o lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
+
+# AWS S3 support
+curl -fL -o "lib/paimon-s3-$PAIMON_VERSION$.jar"
"https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-s3/$PAIMON_VERSION$/paimon-s3-$PAIMON_VERSION$.jar"
# Tiering service
-wget -O "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
```
:::info
You can add more jars to this `lib` directory based on your requirements:
-- **Cloud storage support**: For AWS S3 integration with Paimon, add the
corresponding
[paimon-s3](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-s3/$PAIMON_VERSION$/paimon-s3-$PAIMON_VERSION$.jar)
- **Other catalog backends**: Add jars needed for alternative Paimon catalog
implementations (e.g., Hive, JDBC)
:::
@@ -65,23 +67,59 @@ You can add more jars to this `lib` directory based on your
requirements:
```yaml
services:
+ #begin RustFS (S3-compatible storage)
+ rustfs:
+ image: rustfs/rustfs:latest
+ ports:
+ - "9000:9000"
+ - "9001:9001"
+ environment:
+ - RUSTFS_ACCESS_KEY=rustfsadmin
+ - RUSTFS_SECRET_KEY=rustfsadmin
+ - RUSTFS_CONSOLE_ENABLE=true
+ volumes:
+ - rustfs-data:/data
+ command: /data
+ rustfs-init:
+ image: minio/mc
+ depends_on:
+ - rustfs
+ entrypoint: >
+ /bin/sh -c "
+ until mc alias set rustfs http://rustfs:9000 rustfsadmin rustfsadmin; do
+ echo 'Waiting for RustFS...';
+ sleep 1;
+ done;
+ mc mb --ignore-existing rustfs/fluss;
+ "
+ #end
coordinator-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: coordinatorServer
depends_on:
- - zookeeper
+ zookeeper:
+ condition: service_started
+ rustfs-init:
+ condition: service_completed_successfully
environment:
- |
FLUSS_PROPERTIES=
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://coordinator-server:9123
- remote.data.dir: /tmp/fluss/remote-data
+ remote.data.dir: s3://fluss/remote-data
+ s3.endpoint: http://rustfs:9000
+ s3.access-key: rustfsadmin
+ s3.secret-key: rustfsadmin
+ s3.path.style.access: true
datalake.format: paimon
datalake.paimon.metastore: filesystem
- datalake.paimon.warehouse: /tmp/paimon
+ datalake.paimon.warehouse: s3://fluss/paimon
+ datalake.paimon.s3.endpoint: http://rustfs:9000
+ datalake.paimon.s3.access-key: rustfsadmin
+ datalake.paimon.s3.secret-key: rustfsadmin
+ datalake.paimon.s3.path.style.access: true
volumes:
- - shared-tmpfs:/tmp/paimon
- - shared-tmpfs:/tmp/fluss
+ -
./lib/paimon-s3-$PAIMON_VERSION$.jar:/opt/fluss/plugins/paimon/paimon-s3-$PAIMON_VERSION$.jar
tablet-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: tabletServer
@@ -93,14 +131,21 @@ services:
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://tablet-server:9123
data.dir: /tmp/fluss/data
- remote.data.dir: /tmp/fluss/remote-data
+ remote.data.dir: s3://fluss/remote-data
+ s3.endpoint: http://rustfs:9000
+ s3.access-key: rustfsadmin
+ s3.secret-key: rustfsadmin
+ s3.path.style.access: true
kv.snapshot.interval: 0s
datalake.format: paimon
datalake.paimon.metastore: filesystem
- datalake.paimon.warehouse: /tmp/paimon
+ datalake.paimon.warehouse: s3://fluss/paimon
+ datalake.paimon.s3.endpoint: http://rustfs:9000
+ datalake.paimon.s3.access-key: rustfsadmin
+ datalake.paimon.s3.secret-key: rustfsadmin
+ datalake.paimon.s3.path.style.access: true
volumes:
- - shared-tmpfs:/tmp/paimon
- - shared-tmpfs:/tmp/fluss
+ -
./lib/paimon-s3-$PAIMON_VERSION$.jar:/opt/fluss/plugins/paimon/paimon-s3-$PAIMON_VERSION$.jar
zookeeper:
restart: always
image: zookeeper:3.9.2
@@ -110,8 +155,7 @@ services:
- "8083:8081"
entrypoint: ["/bin/bash", "-c"]
command: >
- "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
- cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+ "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
/docker-entrypoint.sh jobmanager"
environment:
@@ -119,8 +163,6 @@ services:
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
volumes:
- - shared-tmpfs:/tmp/paimon
- - shared-tmpfs:/tmp/fluss
- ./lib:/tmp/jars
- ./opt:/tmp/opt
taskmanager:
@@ -129,8 +171,7 @@ services:
- jobmanager
entrypoint: ["/bin/bash", "-c"]
command: >
- "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
- cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+ "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
/docker-entrypoint.sh taskmanager"
environment:
@@ -139,23 +180,24 @@ services:
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 10
taskmanager.memory.process.size: 2048m
+ taskmanager.memory.task.off-heap.size: 128m
volumes:
- - shared-tmpfs:/tmp/paimon
- - shared-tmpfs:/tmp/fluss
- ./lib:/tmp/jars
- ./opt:/tmp/opt
volumes:
- shared-tmpfs:
- driver: local
- driver_opts:
- type: "tmpfs"
- device: "tmpfs"
+ rustfs-data:
```
The Docker Compose environment consists of the following containers:
- **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a
`ZooKeeper` server.
- **Flink Cluster**: a Flink `JobManager` and a Flink `TaskManager` container
to execute queries.
+- **RustFS**: an S3-compatible storage system used both as Fluss remote
storage and Paimon's filesystem warehouse.
+
+
+:::tip
+[RustFS](https://github.com/rustfs/rustfs) is used as replacement for S3 in
this quickstart example, for your production setup you may want to configure
this to use cloud file system. See [here](/maintenance/filesystems/overview.md)
for information on how to setup cloud file systems
+:::
4. To start all containers, run:
```shell
@@ -198,25 +240,30 @@ cd fluss-quickstart-iceberg
mkdir -p lib opt
# Flink connectors
-wget -O lib/flink-faker-0.5.3.jar
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
-wget -O "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
-wget -O lib/iceberg-flink-runtime-1.20-1.10.1.jar
"https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.10.1/iceberg-flink-runtime-1.20-1.10.1.jar"
+curl -fL -o lib/flink-faker-0.5.3.jar
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+curl -fL -o "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o lib/iceberg-flink-runtime-1.20-1.10.1.jar
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-flink-runtime-1.20/1.10.1/iceberg-flink-runtime-1.20-1.10.1.jar
# Fluss lake plugin
-wget -O "lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-iceberg/$FLUSS_DOCKER_VERSION$/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o "lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-iceberg/$FLUSS_DOCKER_VERSION$/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar"
+
+# Iceberg AWS support (S3FileIO + AWS SDK)
+curl -fL -o lib/iceberg-aws-1.10.1.jar
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws/1.10.1/iceberg-aws-1.10.1.jar
+curl -fL -o lib/iceberg-aws-bundle-1.10.1.jar
https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/1.10.1/iceberg-aws-bundle-1.10.1.jar
-# Hadoop filesystem support
-wget -O lib/hadoop-apache-3.3.5-2.jar
https://repo1.maven.org/maven2/io/trino/hadoop/hadoop-apache/3.3.5-2/hadoop-apache-3.3.5-2.jar
-wget -O lib/failsafe-3.3.2.jar
https://repo1.maven.org/maven2/dev/failsafe/failsafe/3.3.2/failsafe-3.3.2.jar
+# JDBC catalog driver
+curl -fL -o lib/postgresql-42.7.4.jar
https://repo1.maven.org/maven2/org/postgresql/postgresql/42.7.4/postgresql-42.7.4.jar
+
+# Hadoop client (required by Iceberg's Flink integration)
+curl -fL -o lib/hadoop-client-api-3.3.5.jar
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client-api/3.3.5/hadoop-client-api-3.3.5.jar
+curl -fL -o lib/hadoop-client-runtime-3.3.5.jar
https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client-runtime/3.3.5/hadoop-client-runtime-3.3.5.jar
# Tiering service
-wget -O "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
+curl -fL -o "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
```
:::info
You can add more jars to this `lib` directory based on your requirements:
-- **Cloud storage support**: For AWS S3 integration with Iceberg, add the
corresponding Iceberg bundle jars (e.g., `iceberg-aws-bundle`)
-- **Custom Hadoop configurations**: Add jars for specific HDFS distributions
or custom authentication mechanisms
- **Other catalog backends**: Add jars needed for alternative Iceberg catalog
implementations (e.g., Rest, Hive, Glue)
:::
@@ -224,25 +271,81 @@ You can add more jars to this `lib` directory based on
your requirements:
```yaml
services:
+ #begin RustFS (S3-compatible storage)
+ rustfs:
+ image: rustfs/rustfs:latest
+ ports:
+ - "9000:9000"
+ - "9001:9001"
+ environment:
+ - RUSTFS_ACCESS_KEY=rustfsadmin
+ - RUSTFS_SECRET_KEY=rustfsadmin
+ - RUSTFS_CONSOLE_ENABLE=true
+ volumes:
+ - rustfs-data:/data
+ command: /data
+ rustfs-init:
+ image: minio/mc
+ depends_on:
+ - rustfs
+ entrypoint: >
+ /bin/sh -c "
+ until mc alias set rustfs http://rustfs:9000 rustfsadmin rustfsadmin; do
+ echo 'Waiting for RustFS...';
+ sleep 1;
+ done;
+ mc mb --ignore-existing rustfs/fluss;
+ "
+ #end
+ postgres:
+ image: postgres:17
+ environment:
+ - POSTGRES_USER=iceberg
+ - POSTGRES_PASSWORD=iceberg
+ - POSTGRES_DB=iceberg
+ healthcheck:
+ test: ["CMD-SHELL", "pg_isready -U iceberg"]
+ interval: 3s
+ timeout: 3s
+ retries: 5
coordinator-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: coordinatorServer
depends_on:
- - zookeeper
+ postgres:
+ condition: service_healthy
+ zookeeper:
+ condition: service_started
+ rustfs-init:
+ condition: service_completed_successfully
environment:
- |
FLUSS_PROPERTIES=
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://coordinator-server:9123
- remote.data.dir: /tmp/fluss/remote-data
+ remote.data.dir: s3://fluss/remote-data
+ s3.endpoint: http://rustfs:9000
+ s3.access-key: rustfsadmin
+ s3.secret-key: rustfsadmin
+ s3.path.style.access: true
datalake.format: iceberg
- datalake.iceberg.type: hadoop
- datalake.iceberg.warehouse: /tmp/iceberg
+ datalake.iceberg.catalog-impl: org.apache.iceberg.jdbc.JdbcCatalog
+ datalake.iceberg.name: fluss_catalog
+ datalake.iceberg.uri: jdbc:postgresql://postgres:5432/iceberg
+ datalake.iceberg.jdbc.user: iceberg
+ datalake.iceberg.jdbc.password: iceberg
+ datalake.iceberg.warehouse: s3://fluss/iceberg
+ datalake.iceberg.io-impl: org.apache.iceberg.aws.s3.S3FileIO
+ datalake.iceberg.s3.endpoint: http://rustfs:9000
+ datalake.iceberg.s3.access-key-id: rustfsadmin
+ datalake.iceberg.s3.secret-access-key: rustfsadmin
+ datalake.iceberg.s3.path-style-access: true
+ datalake.iceberg.client.region: us-east-1
volumes:
- - shared-tmpfs:/tmp/iceberg
- - shared-tmpfs:/tmp/fluss
-
./lib/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar:/opt/fluss/plugins/iceberg/fluss-lake-iceberg-$FLUSS_DOCKER_VERSION$.jar
- -
./lib/hadoop-apache-3.3.5-2.jar:/opt/fluss/plugins/iceberg/hadoop-apache-3.3.5-2.jar
+ -
./lib/iceberg-aws-1.10.1.jar:/opt/fluss/plugins/iceberg/iceberg-aws-1.10.1.jar
+ -
./lib/iceberg-aws-bundle-1.10.1.jar:/opt/fluss/plugins/iceberg/iceberg-aws-bundle-1.10.1.jar
+ -
./lib/postgresql-42.7.4.jar:/opt/fluss/plugins/iceberg/postgresql-42.7.4.jar
tablet-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: tabletServer
@@ -254,14 +357,25 @@ services:
zookeeper.address: zookeeper:2181
bind.listeners: FLUSS://tablet-server:9123
data.dir: /tmp/fluss/data
- remote.data.dir: /tmp/fluss/remote-data
kv.snapshot.interval: 0s
+ remote.data.dir: s3://fluss/remote-data
+ s3.endpoint: http://rustfs:9000
+ s3.access-key: rustfsadmin
+ s3.secret-key: rustfsadmin
+ s3.path.style.access: true
datalake.format: iceberg
- datalake.iceberg.type: hadoop
- datalake.iceberg.warehouse: /tmp/iceberg
- volumes:
- - shared-tmpfs:/tmp/iceberg
- - shared-tmpfs:/tmp/fluss
+ datalake.iceberg.catalog-impl: org.apache.iceberg.jdbc.JdbcCatalog
+ datalake.iceberg.name: fluss_catalog
+ datalake.iceberg.uri: jdbc:postgresql://postgres:5432/iceberg
+ datalake.iceberg.jdbc.user: iceberg
+ datalake.iceberg.jdbc.password: iceberg
+ datalake.iceberg.warehouse: s3://fluss/iceberg
+ datalake.iceberg.io-impl: org.apache.iceberg.aws.s3.S3FileIO
+ datalake.iceberg.s3.endpoint: http://rustfs:9000
+ datalake.iceberg.s3.access-key-id: rustfsadmin
+ datalake.iceberg.s3.secret-access-key: rustfsadmin
+ datalake.iceberg.s3.path-style-access: true
+ datalake.iceberg.client.region: us-east-1
zookeeper:
restart: always
image: zookeeper:3.9.2
@@ -271,8 +385,7 @@ services:
- "8083:8081"
entrypoint: ["/bin/bash", "-c"]
command: >
- "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
- cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+ "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
/docker-entrypoint.sh jobmanager"
environment:
@@ -280,8 +393,6 @@ services:
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
volumes:
- - shared-tmpfs:/tmp/iceberg
- - shared-tmpfs:/tmp/fluss
- ./lib:/tmp/jars
- ./opt:/tmp/opt
taskmanager:
@@ -290,8 +401,7 @@ services:
- jobmanager
entrypoint: ["/bin/bash", "-c"]
command: >
- "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
- cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+ "cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
/docker-entrypoint.sh taskmanager"
environment:
@@ -300,23 +410,24 @@ services:
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 10
taskmanager.memory.process.size: 2048m
+ taskmanager.memory.task.off-heap.size: 128m
volumes:
- - shared-tmpfs:/tmp/iceberg
- - shared-tmpfs:/tmp/fluss
- ./lib:/tmp/jars
- ./opt:/tmp/opt
volumes:
- shared-tmpfs:
- driver: local
- driver_opts:
- type: "tmpfs"
- device: "tmpfs"
+ rustfs-data:
```
The Docker Compose environment consists of the following containers:
- **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a
`ZooKeeper` server.
- **Flink Cluster**: a Flink `JobManager` and a Flink `TaskManager` container
to execute queries.
+- **PostgreSQL**: stores Iceberg catalog metadata (used by `JdbcCatalog`).
+- **RustFS**: an S3-compatible storage system used both as Fluss remote
storage and Iceberg's filesystem warehouse.
+
+:::tip
+[RustFS](https://github.com/rustfs/rustfs) is used as replacement for S3 in
this quickstart example, for your production setup you may want to configure
this to use cloud file system. See [here](/maintenance/filesystems/overview.md)
for information on how to setup cloud file systems
+:::
4. To start all containers, run:
```shell
@@ -492,12 +603,33 @@ SET 'table.exec.sink.not-null-enforcer'='DROP';
## Create Fluss Tables
### Create Fluss Catalog
Use the following SQL to create a Fluss catalog:
+
+<Tabs groupId="lake-tabs">
+ <TabItem value="paimon" label="Paimon" default>
+
+```sql title="Flink SQL"
+CREATE CATALOG fluss_catalog WITH (
+ 'type' = 'fluss',
+ 'bootstrap.servers' = 'coordinator-server:9123',
+ 'paimon.s3.access-key' = 'rustfsadmin',
+ 'paimon.s3.secret-key' = 'rustfsadmin'
+);
+```
+ </TabItem>
+
+ <TabItem value="iceberg" label="Iceberg">
+
```sql title="Flink SQL"
CREATE CATALOG fluss_catalog WITH (
'type' = 'fluss',
- 'bootstrap.servers' = 'coordinator-server:9123'
+ 'bootstrap.servers' = 'coordinator-server:9123',
+ 'iceberg.jdbc.password' = 'iceberg',
+ 'iceberg.s3.access-key-id' = 'rustfsadmin',
+ 'iceberg.s3.secret-access-key' = 'rustfsadmin'
);
```
+ </TabItem>
+</Tabs>
```sql title="Flink SQL"
USE CATALOG fluss_catalog;
@@ -614,7 +746,11 @@ docker compose exec jobmanager \
--fluss.bootstrap.servers coordinator-server:9123 \
--datalake.format paimon \
--datalake.paimon.metastore filesystem \
- --datalake.paimon.warehouse /tmp/paimon
+ --datalake.paimon.warehouse s3://fluss/paimon \
+ --datalake.paimon.s3.endpoint http://rustfs:9000 \
+ --datalake.paimon.s3.access.key rustfsadmin \
+ --datalake.paimon.s3.secret.key rustfsadmin \
+ --datalake.paimon.s3.path.style.access true
```
You should see a Flink Job to tier data from Fluss to Paimon running in the
[Flink Web UI](http://localhost:8083/).
@@ -627,11 +763,21 @@ Open a new terminal, navigate to the
`fluss-quickstart-iceberg` directory, and e
```shell
docker compose exec jobmanager \
/opt/flink/bin/flink run \
- /opt/flink/opt/fluss-flink-tiering.jar \
+ /opt/flink/opt/fluss-flink-tiering-$FLUSS_VERSION$.jar \
--fluss.bootstrap.servers coordinator-server:9123 \
--datalake.format iceberg \
- --datalake.iceberg.type hadoop \
- --datalake.iceberg.warehouse /tmp/iceberg
+ --datalake.iceberg.catalog-impl org.apache.iceberg.jdbc.JdbcCatalog \
+ --datalake.iceberg.name fluss_catalog \
+ --datalake.iceberg.uri "jdbc:postgresql://postgres:5432/iceberg" \
+ --datalake.iceberg.jdbc.user iceberg \
+ --datalake.iceberg.jdbc.password iceberg \
+ --datalake.iceberg.warehouse "s3://fluss/iceberg" \
+ --datalake.iceberg.io-impl org.apache.iceberg.aws.s3.S3FileIO \
+ --datalake.iceberg.s3.endpoint "http://rustfs:9000" \
+ --datalake.iceberg.s3.access-key-id rustfsadmin \
+ --datalake.iceberg.s3.secret-access-key rustfsadmin \
+ --datalake.iceberg.s3.path-style-access true \
+ --datalake.iceberg.client.region us-east-1
```
You should see a Flink Job to tier data from Fluss to Iceberg running in the
[Flink Web UI](http://localhost:8083/).
@@ -888,9 +1034,19 @@ The files adhere to Iceberg's standard format, enabling
seamless querying with o
</TabItem>
</Tabs>
+### Quitting Sql Client
+
+The following command allows you to quit Flink SQL Client.
+```sql title="Flink SQL"
+quit;
+```
+
+### Tiered Storage
+
+You can visit http://localhost:9001/ and sign in with `rustfsadmin` /
`rustfsadmin` to view the files stored on tiered storage.
+
## Clean up
-After finishing the tutorial, run `exit` to exit Flink SQL CLI Container and
then run
+Run the following to stop all containers.
```shell
docker compose down -v
-```
-to stop all containers.
\ No newline at end of file
+```
\ No newline at end of file