Re: [PR] [website] Release 0.8 blog post and update Download page [fluss]

via GitHub Sun, 02 Nov 2025 01:20:09 -0700


wuchong commented on code in PR #1868:
URL: https://github.com/apache/fluss/pull/1868#discussion_r2484368584



##########
website/blog/releases/0.8.md:
##########
@@ -0,0 +1,192 @@
+---
+title: "Apache Fluss 0.8: Streaming Lakehouse with Iceberg/Lance"
+sidebar_label: "Announcing Apache Fluss 0.8"
+authors: [giannis, jark]
+date: 2025-10-30
+tags: [releases]
+---
+
+![Banner](../assets/0.8/banner.jpg)
+
+🌊 We are excited to announce the official release of **Fluss 0.8**!
+
+This is the first ASF release for Apache Fluss (incubating), marking a 
significant milestone in our journey to provide a robust streaming storage 
platform for real-time analytics.
+Over the past four months, we’ve delivered lots of improvements and new 
capabilities, with more than 390+ commits, across the Streaming Lakehouse 
ecosystem,
+including: deeper integration with Apache Flink, extensive improvements in the 
Streaming Lakehouse with support for [Apache 
Iceberg](https://iceberg.apache.org/) and 
[Lance](https://github.com/lancedb/lance),
+and the introduction of [Delta 
Joins](https://cwiki.apache.org/confluence/display/FLINK/FLIP-486%3A+Introduce+A+New+DeltaJoin),
 which redefine efficiency in stream processing.
+
+Apache Fluss 0.8 marks a new era of **real-time**, **unified**, and 
**zero-state streaming**, designed to power the next generation of data 
platforms, focusing on performance, scalability, and simplicity of the overall 
architecture.
+
+<!-- truncate -->
+
+![Improvements Diagram](../assets/0.8/overview.png)
+
+## Streaming Lakehouse for Iceberg
+
+A key highlight of Fluss 0.8 is the introduction of **Streaming Lakehouse for 
Apache Iceberg** 
([FIP-3](https://cwiki.apache.org/confluence/display/FLUSS/FIP-3%3A+Support+tiering+Fluss+data+to+Iceberg)),
+which transforms Iceberg from a batch-oriented table format into a 
continuously updating Lakehouse. Apache Fluss acts as the **real-time ingestion 
and storage layer**, writing fresh data and updates into Iceberg with 
guaranteed ordering and exactly-once semantics.
+
+This enables real-time data on Fluss to be tiered as Apache Iceberg tables, 
while providing table semantics like partitioning and bucketing on a single 
copy of data.
+Moreover, it solves Iceberg’s long-standing update limitations through Fluss’s 
**native support for upserts and deletes** and its **built-in compaction 
service**,
+which automatically merges small files and maintains optimized Iceberg 
snapshots.
+
+Key benefits include:
+- **Unified Architecture**: Fluss handles sub-second streaming reads and 
writes, while Iceberg stores compacted historical data.
+- **Native Updates and Deletes**: Fluss efficiently applies changes and tiers 
them into Iceberg without rewrite jobs.
+- **Built-in Compaction Service**: The built-in service maintains snapshot 
efficiency with no external tooling.
+- **Efficient Backfilling**: Enables lightning-fast backfill of historical 
data from Iceberg for streaming processing.
+- **Lower Cost**: Reduce storage cost by tiering cold data to Iceberg while 
keeping hot data in Fluss, eliminating the need for duplicate storage.
+- **Lower Latency**: Sub-second data freshness for Iceberg tables by Union 
Read from Fluss and Iceberg.
+
+```yaml title='server.yaml'
+# Iceberg configuration
+datalake.format: iceberg
+
+# the catalog config about Iceberg, assuming using Hadoop catalog,
+datalake.iceberg.type: hadoop
+datalake.iceberg.warehouse: /path/to/iceberg
+```
+
+You can find more detailed instructions in the 
[documentation](/docs/next/streaming-lakehouse/integrate-data-lakes/iceberg/).
+
+## Real-Time Multimodal AI Analytics with Lance
+
+Another major enhancement in Fluss 0.8 is the addition of **Streaming 
Lakehouse support for [Lance](https://github.com/lancedb/lance)** 
([FIP-5](https://cwiki.apache.org/confluence/display/FLUSS/FIP-5%3A+Support+tiering+Fluss+data+to+Lance),
+a modern columnar and vector-native data format designed for AI and machine 
learning workloads.
+This integration extends Apache Fluss towards being a real-time ingestion 
platform for multi-modal data & AI,
+not just traditional tabular streams, but also embeddings, vectors, and 
unstructured features used in AI systems.
+With this release, Fluss can continuously ingest, update, and tier data into 
Lance tables with guaranteed ordering and freshness,
+enabling fast synchronization between streaming pipelines and downstream ML or 
retrieval applications.
+
+Key benefits include:
+
+- **Unified multi-modal data ingestion**: Stream tabular, vector, and 
embedding data into Lance in real time.
+- **AI/ML-ready storage**: Keep feature vectors and embeddings continuously 
up-to-date for model training or inference.
+- **Low-latency analytics and retrieval**: Fast, continuous updates enable 
Lance data to be immediately usable for real-time search and recommendation.
+- **Simplified architecture**: Eliminates complex ETL pipelines between 
streaming systems and vector databases.
+
+Seamless integration: combines Fluss’s high-throughput streaming engine with 
Lance’s efficient columnar persistence for consistent, multi-modal data 
management.
+
+```yaml title='server.yaml'
+datalake.format: lance
+datalake.lance.warehouse: s3://<bucket>
+datalake.lance.endpoint: <endpoint>
+datalake.lance.allow_http: true
+datalake.lance.access_key_id: <access_key_id>
+datalake.lance.secret_access_key: <secret_access_key>
+```
+
+See the [LanceDB blog post](https://lancedb.com/blog/fluss-integration/) for 
the full integration. You also can find more detailed instructions in the 
[documentation](/docs/next/streaming-lakehouse/integrate-data-lakes/lance/).
+
+## Flink 2.1
+
+Apache Fluss is now fully compatible with **Apache Flink 2.1**, ensuring 
seamless integration with the latest Flink runtime and APIs.
+This update strengthens Fluss’s role as a unified streaming storage layer, 
providing reliable performance and consistency for modern data pipelines built 
on Flink.
+
+### Delta Join

Review Comment:
   Since this is listed under `Flink 2.1`, it is implicitly an integration 
feature with Flink. I’ll keep the title as is for conciseness. We’ve already 
noted, `This release introduces support for Delta Joins with Apache Flink`, so 
the context should be clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [website] Release 0.8 blog post and update Download page [fluss]

Reply via email to