Re: [PR] [website] Release 0.8 blog post and update Download page [fluss]

via GitHub Sun, 02 Nov 2025 06:21:52 -0800


leonardBang commented on code in PR #1868:
URL: https://github.com/apache/fluss/pull/1868#discussion_r2484835071



##########
website/blog/releases/0.8.md:
##########
@@ -0,0 +1,215 @@
+---
+title: "Announcing Apache Fluss 0.8: Streaming Lakehouse for Data + AI"
+sidebar_label: "Announcing Apache Fluss 0.8"
+authors: [giannis, jark]
+date: 2025-11-08
+tags: [releases]
+---
+
+![Banner](../assets/0.8/banner.jpg)
+
+🌊 We are excited to announce the official release of **Apache Fluss 0.8 
(incubating)**!
+
+This is our first release under the incubator of the Apache Software 
Foundation, marking a significant milestone in our journey to provide a robust 
streaming storage platform for real-time analytics.
+
+Over the past four months, the community has made tremendous progress, 
delivering nearly 400 commits that push the boundaries of the Streaming 
Lakehouse ecosystem. This release introduces deeper integrations, performance 
breakthroughs, and next-generation stream processing capabilities, including:
+
+* 🔗 Tighter integration with Apache Flink for seamless real-time processing. 
+* 🧊 Enhanced Streaming Lakehouse capabilities with full support for [Apache 
Iceberg](https://iceberg.apache.org/) and 
[Lance](https://lancedb.github.io/lance/)
+* ⚡ Introduction of [Delta 
Joins](https://cwiki.apache.org/confluence/display/FLINK/FLIP-486%3A+Introduce+A+New+DeltaJoin)
 with Flink, a game-changing innovation that redefines efficiency in stream 
processing by minimizing state and maximizing speed.
+
+Apache Fluss 0.8 marks the beginning of a new era in streaming:
+**real-time**, **unified**, and **zero-state**, purpose-built to power the 
next generation of data platforms with **low-latency performance**, 
**scalability**, and **architectural simplicity**.
+
+<!-- truncate -->
+
+![Improvements Diagram](../assets/0.8/overview.png)
+
+## Streaming Lakehouse for Iceberg
+
+A key highlight of Fluss 0.8 is the introduction of **Streaming Lakehouse for 
Apache Iceberg** 
([FIP-3](https://cwiki.apache.org/confluence/display/FLUSS/FIP-3%3A+Support+tiering+Fluss+data+to+Iceberg)),
+which transforms Iceberg from a batch-oriented table format into a 
continuously updating Lakehouse. Apache Fluss acts as the **real-time ingestion 
and storage layer**, writing fresh data and updates into Iceberg with 
guaranteed ordering and exactly-once semantics.
+
+This enables real-time data on Fluss to be tiered as Apache Iceberg tables, 
while providing table semantics like partitioning and bucketing on a single 
copy of data.
+Moreover, it solves Iceberg’s long-standing update limitations through Fluss’s 
**native support for upserts and deletes** and its **built-in compaction 
service**,
+which automatically merges small files and maintains optimized Iceberg 
snapshots.
+
+Key benefits include:
+- **Unified Architecture**: Fluss handles sub-second streaming reads and 
writes, while Iceberg stores compacted historical data.
+- **Native Updates and Deletes**: Fluss efficiently applies changes and tiers 
them into Iceberg without rewrite jobs.
+- **Built-in Compaction Service**: The built-in service maintains snapshot 
efficiency with no external tooling.
+- **Efficient Backfilling**: Enables lightning-fast backfill of historical 
data from Iceberg for streaming processing.
+- **Lower Cost**: Reduce storage cost by tiering cold data to Iceberg while 
keeping hot data in Fluss, eliminating the need for duplicate storage.
+- **Lower Latency**: Sub-second data freshness for Iceberg tables by Union 
Read from Fluss and Iceberg.
+
+```yaml title='server.yaml'
+# Iceberg configuration
+datalake.format: iceberg
+
+# the catalog config about Iceberg, assuming using Hadoop catalog,
+datalake.iceberg.type: hadoop
+datalake.iceberg.warehouse: /path/to/iceberg
+```
+
+You can find more detailed instructions in the [Iceberg Lakehouse 
documentation](/docs/next/streaming-lakehouse/integrate-data-lakes/iceberg/).
+
+## Real-Time Multimodal AI Analytics with Lance
+
+Another major enhancement in Fluss 0.8 is the addition of **Streaming 
Lakehouse support for [Lance](https://github.com/lancedb/lance)** 
([FIP-5](https://cwiki.apache.org/confluence/display/FLUSS/FIP-5%3A+Support+tiering+Fluss+data+to+Lance)),
+a modern columnar and vector-native data format designed for AI and machine 
learning workloads.
+This integration extends Apache Fluss towards being a real-time ingestion 
platform for multi-modal data & AI,
+not just traditional tabular streams, but also embeddings, vectors, and 
unstructured features used in AI systems.
+With this release, Fluss can continuously ingest, update, and tier data into 
Lance tables with guaranteed ordering and freshness,
+enabling fast synchronization between streaming pipelines and downstream ML or 
retrieval applications.
+
+Key benefits include:
+
+- **Unified multi-modal data ingestion**: Stream tabular, vector, and 
embedding data into Lance in real time.
+- **AI/ML-ready storage**: Keep feature vectors and embeddings continuously 
up-to-date for model training or inference.
+- **Low-latency analytics and retrieval**: Fast, continuous updates enable 
Lance data to be immediately usable for real-time search and recommendation.
+- **Simplified architecture**: Eliminates complex ETL pipelines between 
streaming systems and vector databases.
+
+Seamless integration: combines Fluss’s high-throughput streaming engine with 
Lance’s efficient columnar persistence for consistent, multi-modal data 
management.
+
+```yaml title='server.yaml'
+datalake.format: lance
+datalake.lance.warehouse: s3://<bucket>
+datalake.lance.endpoint: <endpoint>
+datalake.lance.allow_http: true
+datalake.lance.access_key_id: <access_key_id>
+datalake.lance.secret_access_key: <secret_access_key>
+```
+
+See the [LanceDB blog post](https://lancedb.com/blog/fluss-integration/) for 
the full integration. You also can find more detailed instructions in the 
[Lance Lakehouse 
documentation](/docs/next/streaming-lakehouse/integrate-data-lakes/lance/).

Review Comment:
   ```suggestion
   See the [LanceDB blog post](https://lancedb.com/blog/fluss-integration/) for 
the full integration. You can also find more detailed instructions in the 
[Lance Lakehouse 
documentation](/docs/next/streaming-lakehouse/integrate-data-lakes/lance/).
   ```



##########
website/blog/releases/0.8.md:
##########
@@ -0,0 +1,215 @@
+---
+title: "Announcing Apache Fluss 0.8: Streaming Lakehouse for Data + AI"
+sidebar_label: "Announcing Apache Fluss 0.8"
+authors: [giannis, jark]
+date: 2025-11-08
+tags: [releases]
+---
+
+![Banner](../assets/0.8/banner.jpg)
+
+🌊 We are excited to announce the official release of **Apache Fluss 0.8 
(incubating)**!
+
+This is our first release under the incubator of the Apache Software 
Foundation, marking a significant milestone in our journey to provide a robust 
streaming storage platform for real-time analytics.
+
+Over the past four months, the community has made tremendous progress, 
delivering nearly 400 commits that push the boundaries of the Streaming 
Lakehouse ecosystem. This release introduces deeper integrations, performance 
breakthroughs, and next-generation stream processing capabilities, including:
+
+* 🔗 Tighter integration with Apache Flink for seamless real-time processing. 
+* 🧊 Enhanced Streaming Lakehouse capabilities with full support for [Apache 
Iceberg](https://iceberg.apache.org/) and 
[Lance](https://lancedb.github.io/lance/)
+* ⚡ Introduction of [Delta 
Joins](https://cwiki.apache.org/confluence/display/FLINK/FLIP-486%3A+Introduce+A+New+DeltaJoin)
 with Flink, a game-changing innovation that redefines efficiency in stream 
processing by minimizing state and maximizing speed.
+
+Apache Fluss 0.8 marks the beginning of a new era in streaming:
+**real-time**, **unified**, and **zero-state**, purpose-built to power the 
next generation of data platforms with **low-latency performance**, 
**scalability**, and **architectural simplicity**.
+
+<!-- truncate -->
+
+![Improvements Diagram](../assets/0.8/overview.png)
+
+## Streaming Lakehouse for Iceberg
+
+A key highlight of Fluss 0.8 is the introduction of **Streaming Lakehouse for 
Apache Iceberg** 
([FIP-3](https://cwiki.apache.org/confluence/display/FLUSS/FIP-3%3A+Support+tiering+Fluss+data+to+Iceberg)),
+which transforms Iceberg from a batch-oriented table format into a 
continuously updating Lakehouse. Apache Fluss acts as the **real-time ingestion 
and storage layer**, writing fresh data and updates into Iceberg with 
guaranteed ordering and exactly-once semantics.
+
+This enables real-time data on Fluss to be tiered as Apache Iceberg tables, 
while providing table semantics like partitioning and bucketing on a single 
copy of data.
+Moreover, it solves Iceberg’s long-standing update limitations through Fluss’s 
**native support for upserts and deletes** and its **built-in compaction 
service**,
+which automatically merges small files and maintains optimized Iceberg 
snapshots.
+
+Key benefits include:
+- **Unified Architecture**: Fluss handles sub-second streaming reads and 
writes, while Iceberg stores compacted historical data.
+- **Native Updates and Deletes**: Fluss efficiently applies changes and tiers 
them into Iceberg without rewrite jobs.
+- **Built-in Compaction Service**: The built-in service maintains snapshot 
efficiency with no external tooling.
+- **Efficient Backfilling**: Enables lightning-fast backfill of historical 
data from Iceberg for streaming processing.
+- **Lower Cost**: Reduce storage cost by tiering cold data to Iceberg while 
keeping hot data in Fluss, eliminating the need for duplicate storage.
+- **Lower Latency**: Sub-second data freshness for Iceberg tables by Union 
Read from Fluss and Iceberg.
+
+```yaml title='server.yaml'
+# Iceberg configuration
+datalake.format: iceberg
+
+# the catalog config about Iceberg, assuming using Hadoop catalog,
+datalake.iceberg.type: hadoop
+datalake.iceberg.warehouse: /path/to/iceberg
+```
+
+You can find more detailed instructions in the [Iceberg Lakehouse 
documentation](/docs/next/streaming-lakehouse/integrate-data-lakes/iceberg/).
+
+## Real-Time Multimodal AI Analytics with Lance
+
+Another major enhancement in Fluss 0.8 is the addition of **Streaming 
Lakehouse support for [Lance](https://github.com/lancedb/lance)** 
([FIP-5](https://cwiki.apache.org/confluence/display/FLUSS/FIP-5%3A+Support+tiering+Fluss+data+to+Lance)),
+a modern columnar and vector-native data format designed for AI and machine 
learning workloads.
+This integration extends Apache Fluss towards being a real-time ingestion 
platform for multi-modal data & AI,
+not just traditional tabular streams, but also embeddings, vectors, and 
unstructured features used in AI systems.
+With this release, Fluss can continuously ingest, update, and tier data into 
Lance tables with guaranteed ordering and freshness,
+enabling fast synchronization between streaming pipelines and downstream ML or 
retrieval applications.
+
+Key benefits include:
+
+- **Unified multi-modal data ingestion**: Stream tabular, vector, and 
embedding data into Lance in real time.
+- **AI/ML-ready storage**: Keep feature vectors and embeddings continuously 
up-to-date for model training or inference.
+- **Low-latency analytics and retrieval**: Fast, continuous updates enable 
Lance data to be immediately usable for real-time search and recommendation.
+- **Simplified architecture**: Eliminates complex ETL pipelines between 
streaming systems and vector databases.
+
+Seamless integration: combines Fluss’s high-throughput streaming engine with 
Lance’s efficient columnar persistence for consistent, multi-modal data 
management.
+
+```yaml title='server.yaml'
+datalake.format: lance
+datalake.lance.warehouse: s3://<bucket>
+datalake.lance.endpoint: <endpoint>
+datalake.lance.allow_http: true
+datalake.lance.access_key_id: <access_key_id>
+datalake.lance.secret_access_key: <secret_access_key>
+```
+
+See the [LanceDB blog post](https://lancedb.com/blog/fluss-integration/) for 
the full integration. You also can find more detailed instructions in the 
[Lance Lakehouse 
documentation](/docs/next/streaming-lakehouse/integrate-data-lakes/lance/).
+
+## Flink 2.1
+
+Apache Fluss is now fully compatible with **Apache Flink 2.1**, ensuring 
seamless integration with the latest Flink runtime and APIs.
+This update strengthens Fluss’s role as a unified streaming storage layer, 
providing reliable performance and consistency for modern data pipelines built 
on Flink.
+
+### Delta Join
+
+The Delta Join is a major step towards the era of zero-state streaming joins. 
This release introduces support for Delta Joins with Apache Flink.
+By externalizing state into Fluss tables, Flink performs joins incrementally 
on data deltas, without maintaining large states.
+This architecture reduces CPU and memory usage by **up to 80%**, eliminates 
over **100 TB of state** as witnessed in the first production use cases from 
[early adopters](blog/2025-08-07-taobao-practice.md),
+and cuts checkpoint durations from **90 seconds to just 1 second**. Because 
all data lives natively in Fluss tables,
+there’s **no state bootstrapping**; pipelines start instantly, stay 
lightweight, and achieve efficiency for real-time analytics at scale.
+
+Below is a performance comparison (CPU, memory, state size, checkpoint 
interval) between Delta Join and Stream-Stream Join, as evaluated by Taobao’s 
Search & Recommendation Systems team.
+
+![](../assets/taobao_practice/performance_delta1.png)
+
+![](../assets/taobao_practice/performance_delta2.png)
+
+
+You can find more detailed instructions in the [Delta Join 
documentation](/docs/next/engine-flink/delta-joins/).
+
+### Materialized Table
+
+Apache Fluss 0.8 introduces support for Flink Materialized Tables, enabling 
seamless, low-latency materializations directly over Fluss streams.
+Flink’s Materialized Table turns a SQL query into a continuously or 
periodically refreshed result table with a defined freshness target (e.g., 
seconds or minutes).
+With Fluss as the underlying streaming source, users can declaratively build 
real-time tables that stay up to date without custom orchestration.
+This integration unifies batch and streaming ETL: Fluss delivers 
high-throughput, low-latency data, while Flink continuously maintains derived 
tables for analytics,
+APIs, and downstream workloads, providing real-time, consistent data pipelines 
with minimal operational overhead.
+This integration further strengthens the batch & stream unification.
+
+```sql title="Flink SQL"
+-- 1. create a materialized table with 10 seconds freshness
+CREATE MATERIALIZED TABLE fluss.dw.sales_summary
+FRESHNESS = INTERVAL '10' SECOND
+AS SELECT
+  product,
+  SUM(quantity) AS total_sales,
+  CURRENT_TIMESTAMP() AS last_updated
+FROM fluss.dw.sales_detail
+GROUP BY product;
+
+-- 2. suspend data refresh for the materialized table
+ALTER MATERIALIZED TABLE dwd_orders SUSPEND;
+
+-- 3. resume data refresh for the materialized table
+ALTER MATERIALIZED TABLE dwd_orders RESUME
+-- Set table option via WITH clause
+WITH(
+ 'sink.parallelism' = '10'
+);
+```
+
+You can find more detailed instructions in the [Materialized Table 
documentation](/docs/next/engine-flink/ddl/#materialized-table).
+
+## Stability
+
+In this release, we have made significant improvements in the stability and 
reliability of Apache Fluss under large-scale production workloads.
+Through continuous validation across multiple business units within Alibaba 
Group, and especially through large-scale workloads during the Alibaba's Double 
11 peak traffic, we have resolved over 35 stability-related issues.
+These improvements substantially enhance Fluss’s robustness in 
mission-critical streaming use cases.
+
+Key improvements include:
+- **[Graceful 
Shutdown](/docs/next/maintenance/operations/graceful-shutdown/)**: Introduced a 
graceful shutdown mechanism for TabletServers. During shutdown, leadership is 
proactively migrated before termination, ensuring that read/write latency 
remains unaffected by node decommissioning.
+- **Accelerated Coordinator Event Processing**: Optimized the Coordinator’s 
event handling mechanism through asynchronous processing and batched ZooKeeper 
operations. As a result, all events are now processed in milliseconds.
+- **Faster Coordinator Recovery**: Parallelized initialization cuts 
Coordinator startup time from 10 minutes to just 20 seconds in production-scale 
benchmarks, this dramatically improves service availability and recovery speed.
+- **Optimized Server Metrics**: Refined metric granularity and reporting logic 
to reduce telemetry volume by 90% while preserving full observability.
+- **Enhanced Metadata Performance**: Addressed metadata bottlenecks during 
mass client restarts by strengthening the server local cache and introducing 
asynchronous ZooKeeper operations. This reduces metadata request latency from 
>10 seconds to milliseconds, ensuring stable client reconnection under load.
+
+With these foundational stability improvements, Fluss 0.8 is now 
production-ready for the most demanding real-time workloads, including 
Alibaba’s annual Double 11 global shopping festival.
+
+## Dynamic Configuration
+
+Starting with Fluss version 0.8, certain **cluster-level configurations** and 
**table-level configurations** can be updated dynamically, without requiring a 
cluster restart or table recreation. This enables operators and developers to 
adjust system behavior in real time, improving operational agility and 
minimizing downtime.
+
+### Dynamic Cluster Configs
+
+Fluss now supports runtime updates for cluster configuration parameters. These 
changes take effect immediately across the cluster after being applied through 
the API.
+
+```java title="Java Client"
+Admin admin = connection.getAdmin();
+Collection<AlterConfig> configsToUpdate = Arrays.asList(
+    new AlterConfig("datalake.format", "paimon", AlterConfigOpType.SET));
+admin.alterClusterConfigs(configsToUpdate)
+```
+
+### Dynamic Table Configs
+
+Fluss now supports update options dynamically on a table using the `ALTER 
TABLE ... SET` statement. This supports all the client-wise options (like 
`scan.startup.mode`) and some storage-wise options (like 
`table.datalake.enabled`).
+
+```sql title="Flink SQL"
+-- Enable lakehouse storage for the given table
+ALTER TABLE my_table SET ('table.datalake.enabled' = 'true');
+```
+
+When you issue a `ALTER TABLE ... SET` command to update storage options on a 
table, the Fluss cluster validates and applies the new configuration 
immediately. The updated settings are propagated to all TabletServers and 
CoordinatorServer components, ensuring consistent behavior going forward.
+
+This capability is especially useful for tuning performance, adapting to 
changing data patterns, or complying with evolving data governance 
requirements—all without service interruption.
+
+You can find more detailed instructions in the [Updating Configs 
documentation](/docs/next/maintenance/operations/updating-configs/).
+
+## Helm Charts
+
+This release also introduced Helm Charts. With this addition, users can now 
deploy and manage a full Fluss cluster using [Helm](https://helm.sh/).
+The Helm chart simplifies provisioning, upgrades, and scaling by packaging 
configuration, manifests, and dependencies into a single, versioned release.
+This should help users running Fluss on Kubernetes faster, more reliably, and 
with easier integration into existing CI/CD and observability setups, 
significantly lowering the barrier for teams adopting Fluss in production.
+
+You can find more detailed instructions in the [Deploying with Helm 
documentation](/docs/next/install-deploy/deploying-with-helm/).
+
+## Ecosystem
+
+The Apache Fluss community is actively expanding Fluss beyond the JVM 
ecosystem with new **native clients** for Rust and Python, enabling seamless 
integration across modern data and AI workflows.
+We’ve established an [official 
repository](https://github.com/apache/fluss-rust) to host both the Rust and 
Python clients, developed with performance, safety, and developer experience in 
mind:
+
+- **🦀 Rust Client**: Built on async I/O, zero-copy columnar streaming (via 
Apache Arrow), and Rust’s memory safety guarantees, this client unlocks 
high-performance query integration with native OLAP engines like DuckDB and 
StarRocks.
+- **🐍 Python Client**: Built as a native binding on top of the Rust client, it 
allows Python developers to interact with Fluss tables and streams directly 
from data science, ML, and analytics workflows.
+
+The Rust and Python clients are maintained in a [separate 
repository](https://github.com/apache/fluss-rust) to allow for faster iteration 
and releases, and therefore are not part of the Fluss 0.8 release.
+However, the community is actively stabilizing the clients and plans to 
release them soon.
+
+## Upgrade Notes
+
+The Fluss community tries to ensure that upgrades are as seamless as possible. 
However, certain changes may require users to make adjustments to certain parts 
of the program when upgrading to version 0.8.
+Please refer to the [upgrade 
notes](/docs/next/maintenance/operations/upgrade-notes-0.8/) for a 
comprehensive list of adjustments to make and issues to check during the 
upgrading process.
+
+## List of contributors
+The Fluss community would like to express gratitude to all the contributors 
who made this release possible:
+
+> Alibaba-HZY, CaoZhen, CenterCode, CodeDrinks, David, Giannis Polyzos, 
Hemanth Savasere, Hongshun Wang, Jark Wu, Jensen, Junbo Wang, Kerwin, Leonard 
Xu, Liebing, Maggie Cao, Mahesh Sambaram, MehulBatra, Michael Koepf, Rafael 
Sousa, Rion Williams, Ron, Sergey Nuyanzin, SeungMin, Wang Cheng, 
XianmingZhou00, Xuyang, Yang Guo, Yang Wang, Yunchi Pang, ZijunZhao, Zmm, 
andybj0228, buvb, cxxwang, dependabot[bot], jackylee, leosanqing, naivedogger, 
ocean.wy, pisceslj, totalo, xiaochen, xiaozhou, xx789, yunhong, yuxia Luo
+
+Apache Fluss is under active development. Be sure to stay updated on the 
project, give it a try and if you like it,
+don’t forget to give it some ❤️ via ⭐ on 
[GitHub](https://github.com/apache/fluss).

Review Comment:
   maybe the ending could be shorter like:
   ```suggestion
   give it a try and show your support with a ⭐ on 
[GitHub](https://github.com/apache/fluss).
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [website] Release 0.8 blog post and update Download page [fluss]

Reply via email to