This is an automated email from the ASF dual-hosted git repository.
yuxia pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/fluss.git
The following commit(s) were added to refs/heads/main by this push:
new 368551979 [docs] update quickstart paimon to align to latest branch
368551979 is described below
commit 368551979a5d4a3daef5a64e0aa5a3fa734c1028
Author: yuxia Luo <[email protected]>
AuthorDate: Fri Feb 6 14:17:54 2026 +0800
[docs] update quickstart paimon to align to latest branch
---
.../tiered-storage/lakehouse-storage.md | 4 +-
website/docs/quickstart/lakehouse.md | 168 ++++++++++++++-------
2 files changed, 115 insertions(+), 57 deletions(-)
diff --git a/website/docs/maintenance/tiered-storage/lakehouse-storage.md
b/website/docs/maintenance/tiered-storage/lakehouse-storage.md
index f4ac426aa..2b9e984f3 100644
--- a/website/docs/maintenance/tiered-storage/lakehouse-storage.md
+++ b/website/docs/maintenance/tiered-storage/lakehouse-storage.md
@@ -35,7 +35,7 @@ datalake.paimon.metastore: filesystem
datalake.paimon.warehouse: /tmp/paimon
```
-Fluss processes Paimon configurations by removing the `datalake.paimon.`
prefix and then use the remaining configuration (without the prefix
`datalake.paimon.`) to create the Paimon catalog. Checkout the [Paimon
documentation](https://paimon.apache.org/docs/1.3/maintenance/configurations/)
for more details on the available configurations.
+Fluss processes Paimon configurations by removing the `datalake.paimon.`
prefix and then use the remaining configuration (without the prefix
`datalake.paimon.`) to create the Paimon catalog. Checkout the [Paimon
documentation](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/maintenance/configurations/)
for more details on the available configurations.
For example, if you want to configure to use Hive catalog, you can configure
like following:
```yaml
@@ -66,7 +66,7 @@ Then, you must start the datalake tiering service to tier
Fluss's data to the la
- Put [fluss-lake-paimon
jar](https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_VERSION$/fluss-lake-paimon-$FLUSS_VERSION$.jar)
into `${FLINK_HOME}/lib`
- Put [paimon-bundle
jar](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar)
into `${FLINK_HOME}/lib`
- [Download](https://flink.apache.org/downloads/) pre-bundled Hadoop jar
`flink-shaded-hadoop-2-uber-*.jar` and put into `${FLINK_HOME}/lib`
-- Put Paimon's [filesystem
jar](https://paimon.apache.org/docs/1.3/project/download/) into
`${FLINK_HOME}/lib`, if you use s3 to store paimon data, please put `paimon-s3`
jar into `${FLINK_HOME}/lib`
+- Put Paimon's [filesystem
jar](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/project/download/)
into `${FLINK_HOME}/lib`, if you use s3 to store paimon data, please put
`paimon-s3` jar into `${FLINK_HOME}/lib`
- The other jars that Paimon may require, for example, if you use HiveCatalog,
you will need to put hive related jars
diff --git a/website/docs/quickstart/lakehouse.md
b/website/docs/quickstart/lakehouse.md
index 91785b436..f99175fc2 100644
--- a/website/docs/quickstart/lakehouse.md
+++ b/website/docs/quickstart/lakehouse.md
@@ -32,12 +32,39 @@ mkdir fluss-quickstart-paimon
cd fluss-quickstart-paimon
```
-2. Create a `docker-compose.yml` file with the following content:
+2. Create directories and download required jars:
+
+```shell
+mkdir -p lib opt
+
+# Flink connectors
+wget -O lib/flink-faker-0.5.3.jar
https://github.com/knaufk/flink-faker/releases/download/v0.5.3/flink-faker-0.5.3.jar
+wget -O "lib/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-1.20/$FLUSS_DOCKER_VERSION$/fluss-flink-1.20-$FLUSS_DOCKER_VERSION$.jar"
+wget -O "lib/paimon-flink-1.20-$PAIMON_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/paimon/paimon-flink-1.20/$PAIMON_VERSION$/paimon-flink-1.20-$PAIMON_VERSION$.jar"
+
+# Fluss lake plugin
+wget -O "lib/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-lake-paimon/$FLUSS_DOCKER_VERSION$/fluss-lake-paimon-$FLUSS_DOCKER_VERSION$.jar"
+
+# Paimon bundle jar
+wget -O "lib/paimon-bundle-$PAIMON_VERSION$.jar"
"https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-bundle/$PAIMON_VERSION$/paimon-bundle-$PAIMON_VERSION$.jar"
+
+# Hadoop bundle jar
+wget -O lib/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
https://repo.maven.apache.org/maven2/org/apache/flink/flink-shaded-hadoop-2-uber/2.8.3-10.0/flink-shaded-hadoop-2-uber-2.8.3-10.0.jar
+
+# Tiering service
+wget -O "opt/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
"https://repo1.maven.org/maven2/org/apache/fluss/fluss-flink-tiering/$FLUSS_DOCKER_VERSION$/fluss-flink-tiering-$FLUSS_DOCKER_VERSION$.jar"
+```
+:::info
+You can add more jars to this `lib` directory based on your requirements:
+- **Cloud storage support**: For AWS S3 integration with Paimon, add the
corresponding
[paimon-s3](https://repo.maven.apache.org/maven2/org/apache/paimon/paimon-s3/$PAIMON_VERSION$/paimon-s3-$PAIMON_VERSION$.jar)
+- **Other catalog backends**: Add jars needed for alternative Paimon catalog
implementations (e.g., Hive, JDBC)
+ :::
+
+3. Create a `docker-compose.yml` file with the following content:
```yaml
services:
- #begin Fluss cluster
coordinator-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: coordinatorServer
@@ -54,6 +81,7 @@ services:
datalake.paimon.warehouse: /tmp/paimon
volumes:
- shared-tmpfs:/tmp/paimon
+ - shared-tmpfs:/tmp/fluss
tablet-server:
image: apache/fluss:$FLUSS_DOCKER_VERSION$
command: tabletServer
@@ -72,37 +100,50 @@ services:
datalake.paimon.warehouse: /tmp/paimon
volumes:
- shared-tmpfs:/tmp/paimon
+ - shared-tmpfs:/tmp/fluss
zookeeper:
restart: always
image: zookeeper:3.9.2
- #end
- #begin Flink cluster
jobmanager:
- image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+ image: flink:1.20-scala_2.12-java17
ports:
- "8083:8081"
- command: jobmanager
+ entrypoint: ["/bin/bash", "-c"]
+ command: >
+ "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+ cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+ cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+ /docker-entrypoint.sh jobmanager"
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
volumes:
- shared-tmpfs:/tmp/paimon
+ - shared-tmpfs:/tmp/fluss
+ - ./lib:/tmp/jars
+ - ./opt:/tmp/opt
taskmanager:
- image: apache/fluss-quickstart-flink:1.20-$FLUSS_DOCKER_VERSION$
+ image: flink:1.20-scala_2.12-java17
depends_on:
- jobmanager
- command: taskmanager
+ entrypoint: ["/bin/bash", "-c"]
+ command: >
+ "sed -i 's/exec $(drop_privs_cmd)//g' /docker-entrypoint.sh &&
+ cp /tmp/jars/*.jar /opt/flink/lib/ 2>/dev/null || true;
+ cp /tmp/opt/*.jar /opt/flink/opt/ 2>/dev/null || true;
+ /docker-entrypoint.sh taskmanager"
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 10
taskmanager.memory.process.size: 2048m
- taskmanager.memory.framework.off-heap.size: 256m
volumes:
- shared-tmpfs:/tmp/paimon
- #end
+ - shared-tmpfs:/tmp/fluss
+ - ./lib:/tmp/jars
+ - ./opt:/tmp/opt
volumes:
shared-tmpfs:
@@ -116,11 +157,7 @@ The Docker Compose environment consists of the following
containers:
- **Fluss Cluster:** a Fluss `CoordinatorServer`, a Fluss `TabletServer` and a
`ZooKeeper` server.
- **Flink Cluster**: a Flink `JobManager` and a Flink `TaskManager` container
to execute queries.
-**Note:** The `apache/fluss-quickstart-flink` image is based on
[flink:1.20.3-java17](https://hub.docker.com/layers/library/flink/1.20-java17/images/sha256:296c7c23fa40a9a3547771b08fc65e25f06bc4cfd3549eee243c99890778cafc)
and
-includes the [fluss-flink](engine-flink/getting-started.md),
[paimon-flink](https://paimon.apache.org/docs/1.3/flink/quick-start/) and
-[flink-connector-faker](https://flink-packages.org/packages/flink-faker) to
simplify this guide.
-
-3. To start all containers, run:
+4. To start all containers, run:
```shell
docker compose up -d
```
@@ -312,23 +349,69 @@ Congratulations, you are all set!
First, use the following command to enter the Flink SQL CLI Container:
```shell
-docker compose exec jobmanager ./sql-client
+docker compose exec jobmanager ./bin/sql-client.sh
```
-**Note**:
-To simplify this guide, three temporary tables have been pre-created with
`faker` connector to generate data.
-You can view their schemas by running the following commands:
+To simplify this guide, we will create three temporary tables with `faker`
connector to generate data:
+
+```sql title="Flink SQL"
+CREATE TEMPORARY TABLE source_order (
+ `order_key` BIGINT,
+ `cust_key` INT,
+ `total_price` DECIMAL(15, 2),
+ `order_date` DATE,
+ `order_priority` STRING,
+ `clerk` STRING
+) WITH (
+ 'connector' = 'faker',
+ 'rows-per-second' = '10',
+ 'number-of-rows' = '10000',
+ 'fields.order_key.expression' = '#{number.numberBetween
''0'',''100000000''}',
+ 'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+ 'fields.total_price.expression' = '#{number.randomDouble
''3'',''1'',''1000''}',
+ 'fields.order_date.expression' = '#{date.past ''100'' ''DAYS''}',
+ 'fields.order_priority.expression' = '#{regexify ''(low|medium|high){1}''}',
+ 'fields.clerk.expression' = '#{regexify
''(Clerk1|Clerk2|Clerk3|Clerk4){1}''}'
+);
+```
```sql title="Flink SQL"
-SHOW CREATE TABLE source_customer;
+CREATE TEMPORARY TABLE source_customer (
+ `cust_key` INT,
+ `name` STRING,
+ `phone` STRING,
+ `nation_key` INT NOT NULL,
+ `acctbal` DECIMAL(15, 2),
+ `mktsegment` STRING,
+ PRIMARY KEY (`cust_key`) NOT ENFORCED
+) WITH (
+ 'connector' = 'faker',
+ 'number-of-rows' = '200',
+ 'fields.cust_key.expression' = '#{number.numberBetween ''0'',''20''}',
+ 'fields.name.expression' = '#{funnyName.name}',
+ 'fields.nation_key.expression' = '#{number.numberBetween ''1'',''5''}',
+ 'fields.phone.expression' = '#{phoneNumber.cellPhone}',
+ 'fields.acctbal.expression' = '#{number.randomDouble ''3'',''1'',''1000''}',
+ 'fields.mktsegment.expression' = '#{regexify
''(AUTOMOBILE|BUILDING|FURNITURE|MACHINERY|HOUSEHOLD){1}''}'
+);
```
```sql title="Flink SQL"
-SHOW CREATE TABLE source_order;
+CREATE TEMPORARY TABLE `source_nation` (
+ `nation_key` INT NOT NULL,
+ `name` STRING,
+ PRIMARY KEY (`nation_key`) NOT ENFORCED
+) WITH (
+ 'connector' = 'faker',
+ 'number-of-rows' = '100',
+ 'fields.nation_key.expression' = '#{number.numberBetween ''1'',''5''}',
+ 'fields.name.expression' = '#{regexify
''(CANADA|JORDAN|CHINA|UNITED|INDIA){1}''}'
+);
```
```sql title="Flink SQL"
-SHOW CREATE TABLE source_nation;
+-- drop records silently if a null value would have to be inserted into a NOT
NULL column
+SET 'table.exec.sink.not-null-enforcer'='DROP';
```
</TabItem>
@@ -635,10 +718,6 @@ CREATE TABLE datalake_enriched_orders (
```
Next, perform streaming data writing into the **datalake-enabled** table,
`datalake_enriched_orders`:
-```sql title="Flink SQL"
--- switch to streaming mode
-SET 'execution.runtime-mode' = 'streaming';
-```
```sql title="Flink SQL"
-- insert tuples into datalake_enriched_orders
@@ -674,9 +753,15 @@ The data for the `datalake_enriched_orders` table is
stored in Fluss (for real-t
When querying the `datalake_enriched_orders` table, Fluss uses a union
operation that combines data from both Fluss and Paimon to provide a complete
result set -- combines **real-time** and **historical** data.
If you wish to query only the data stored in Paimon—offering high-performance
access without the overhead of unioning data—you can use the
`datalake_enriched_orders$lake` table by appending the `$lake` suffix.
-This approach also enables all the optimizations and features of a Flink
Paimon table source, including [system
table](https://paimon.apache.org/docs/1.3/concepts/system-tables/) such as
`datalake_enriched_orders$lake$snapshots`.
+This approach also enables all the optimizations and features of a Flink
Paimon table source, including [system
table](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/concepts/system-tables/)
such as `datalake_enriched_orders$lake$snapshots`.
To query the snapshots directly from Paimon, use the following SQL:
+
+```sql title="Flink SQL"
+-- use tableau result mode
+SET 'sql-client.execution.result-mode' = 'tableau';
+```
+
```sql title="Flink SQL"
-- switch to batch mode
SET 'execution.runtime-mode' = 'batch';
@@ -726,33 +811,7 @@ The result looks like:
```
You can execute the real-time analytics query multiple times, and the results
will vary with each run as new data is continuously written to Fluss in
real-time.
-Finally, you can use the following command to view the files stored in Paimon:
-```shell
-docker compose exec taskmanager tree /tmp/paimon/fluss.db
-```
-
-**Sample Output:**
-```shell
-/tmp/paimon/fluss.db
-└── datalake_enriched_orders
- ├── bucket-0
- │ ├── changelog-aef1810f-85b2-4eba-8eb8-9b136dec5bdb-0.orc
- │ └── data-aef1810f-85b2-4eba-8eb8-9b136dec5bdb-1.orc
- ├── manifest
- │ ├── manifest-aaa007e1-81a2-40b3-ba1f-9df4528bc402-0
- │ ├── manifest-aaa007e1-81a2-40b3-ba1f-9df4528bc402-1
- │ ├── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-0
- │ ├── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-1
- │ └── manifest-list-ceb77e1f-7d17-4160-9e1f-f334918c6e0d-2
- ├── schema
- │ └── schema-0
- └── snapshot
- ├── EARLIEST
- ├── LATEST
- └── snapshot-1
-```
-
-The files adhere to Paimon's standard format, enabling seamless querying with
other engines such as
[Spark](https://paimon.apache.org/docs/1.3/spark/quick-start/) and
[Trino](https://paimon.apache.org/docs/1.3/ecosystem/trino/).
+The files adhere to Paimon's standard format, enabling seamless querying with
other engines such as
[Spark](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/spark/quick-start/)
and
[Trino](https://paimon.apache.org/docs/$PAIMON_VERSION_SHORT$/ecosystem/trino/).
</TabItem>
@@ -776,7 +835,6 @@ SET 'sql-client.execution.result-mode' = 'tableau';
SET 'execution.runtime-mode' = 'batch';
```
-
```sql title="Flink SQL"
-- query snapshots in iceberg
SELECT snapshot_id, operation FROM datalake_enriched_orders$lake$snapshots;