Re: [PR] [blog] 0.9 Release announcement [fluss]

via GitHub Fri, 13 Feb 2026 23:38:15 -0800


MehulBatra commented on code in PR #2590:
URL: https://github.com/apache/fluss/pull/2590#discussion_r2807155842



##########
website/blog/releases/0.9.md:
##########
@@ -0,0 +1,293 @@
+---
+title: "Apache Fluss (Incubating) 0.9 Release Announcement"
+sidebar_label: "Apache Fluss 0.9 Release Announcement"
+authors: [giannis, jark]
+date: 2026-02-26
+tags: [releases]
+---
+
+![Banner](../assets/0.9/banner.png)
+
+🌊 We are excited to announce the official release of **Apache Fluss 
(Incubating) 0.9**!
+
+This release marks a major milestone for the project. Fluss 0.9 significantly 
expands Fluss’s capabilities as a **streaming storage system for real-time 
analytics, AI, and state-heavy streaming workloads**, with a strong focus on:
+
+- Richer and more flexible data models
+- Safe, zero-downtime schema evolution
+- Storage-level optimizations (aggregations, CDC, formats)
+- Stronger operational guarantees and scalability
+- A more mature ecosystem and developer experience
+
+Whether you’re building **unified stream & lakehouse architectures**, 
**real-time analytics**, **feature/context stores**, or **long-running stateful 
pipelines**, Fluss 0.9 introduces powerful new primitives that make these 
systems easier, safer, and more efficient to operate at scale.
+
+---
+
+### TL;DR: What Fluss 0.9 Unlocks
+- **Zero-copy schema evolution** for evolving streaming jobs
+- **Storage-level aggregations** that further enhances zero-state processing
+- **Virtual tables** for CDC, audit trails, point-in-time recovery, and ML 
reproducibility
+- **Safer snapshot-based reads** with consumer-aware lifecycle management
+- **Operationally robust clusters** with automatic rebalancing and safer 
maintenance workflows
+- **Apache Spark integration**, enabling unified batch and streaming analytics 
on Fluss
+- **First-class Azure support**, allowing Fluss to tier and operate seamlessly 
on Azure Blob Storage and ADLS Gen2
+
+
+<!-- truncate -->
+
+## 1. Richer Data Models & Schema Evolution
+
+### Support for Complex Data Types
+Apache Fluss 0.9 strengthens and extends support for **complex data types** 
with a focus on **deep nesting**, **safe schema evolution**, and new 
**ML-oriented use cases**.
+
+Supported types now include `ARRAY`, `MAP`, `ROW`, and deeply nested 
structures. For example, complex schemas such as:
+> `ARRAY<MAP<STRING, ROW<values ARRAY<FLOAT>, ts TIMESTAMP_LTZ(3)>>>`
+
+are now fully supported in production. These nested structures are handled as 
**schema-aware** rows, not opaque payloads, ensuring **data correctness** and 
**type safety**.
+
+Additionally, adding new columns does not affect existing jobs after clients 
upgrade to 0.9. Moreover, with support for the **Lance format**, Fluss can be 
used for the ingestion of **multi-modal data** and **vector storage**. Users 
can now store **embeddings** directly in tables using `ARRAY<FLOAT>` or 
`ARRAY<DOUBLE>`:
+
+```sql
+CREATE TABLE documents (
+  doc_id BIGINT PRIMARY KEY,
+  embedding ARRAY<FLOAT>
+);
+```
+
+This enables new use cases where Fluss acts as the **source of truth for 
vector embeddings**, which can then be incrementally consumed by vector engines 
to maintain **ANN indexes**.
+
+You can find the complete feature umbrella 
[here](https://github.com/apache/fluss/issues/816).
+
+
+### Schema Evolution with Zero-Copy Semantics
+**Schema evolution** is critical for evolving systems and use cases like 
**feature calculations**, and Fluss 0.9 delivers a major step forward in this 
area.
+
+The release adds support for **altering table schemas** by appending new 
columns, fully integrated with **Flink SQL** and **Fluss DDL**. For example:
+```sql
+-- Add a single column at the end of the table
+ALTER TABLE my_table ADD user_email STRING COMMENT 'User email address';
+
+-- Add multiple columns at the end of the table
+ALTER TABLE MyTable ADD (
+    user_email STRING COMMENT 'User email address',
+    order_quantity INT
+);
+```
+
+For more information, see [Flink DDL 
support](https://fluss.apache.org/docs/next/engine-flink/ddl/#alter-table).
+
+**Zero-copy schema evolution** means that existing data files are **not 
rewritten** when a schema changes. Instead, only **metadata is updated**.
+
+Existing records simply do not contain the new column, and readers interpret 
missing fields as `NULL` or default values. New records immediately include the 
new column without impacting historical data.
+
+This approach **avoids downtime**, **eliminates expensive backfills**, and 
ensures **predictable performance** during schema changes. It is especially 
important for streaming pipelines that are expected to run continuously over 
long periods of time.
+
+## 2. Storage-Level Processing & Semantics
+
+### Aggregation Merge Engine
+Apache Fluss now supports **storage-level aggregations** via a new 
**Aggregation Merge Engine**, enabling real-time aggregation to be pushed down 
from the compute layer into the **Fluss storage layer**.
+
+Traditionally, real-time aggregations are maintained in Flink state, which can 
lead to:
+* **Large and growing state size** 
+* **Slower checkpoints and recovery** 
+* **Limited scalability** for high-cardinality aggregations
+  
+With the **Aggregation Merge Engine**, aggregation state is **externalized to 
Fluss**, allowing Flink jobs to remain **nearly stateless** while Fluss 
efficiently maintains aggregated results.
+```sql
+CREATE TABLE campaign_uv (
+    campaign_id STRING,
+    uv_bitmap BYTES,
+    total_events BIGINT,
+    last_event_time TIMESTAMP,
+    PRIMARY KEY (campaign_id) NOT ENFORCED
+) WITH (
+    'table.merge-engine' = 'aggregation',
+    'fields.uv_bitmap.agg' = 'rbm64',
+    'fields.total_events.agg' = 'sum',
+    'fields.last_event_time.agg' = 'last_value_ignore_nulls'
+);
+```
+
+Maintain **continuously updated metrics** (for example, order counts or model 
aggregated features) directly in Fluss tables, while Flink focuses only on 
event ingestion and lightweight processing.
+
+The aggregation merge engine is another step towards Fluss’s **compute-storage 
separation**, which you can find more about 
[here](https://www.ververica.com/blog/introducing-the-era-of-zero-state-streaming-joins?hs_preview=cdhHvcIE-199898654106).
+
+**Design Reference:** [FIP-21: Aggregation Merge 
Engine](https://cwiki.apache.org/confluence/display/FLUSS/FIP-21%3A+Aggregation+Merge+Engine)
+
+### Support for Compacted LogFormat
+By default, Fluss uses **Apache Arrow–based columnar storage**, which is ideal 
for analytical workloads with selective column access. However, some workloads 
do not benefit from columnar layouts, especially when **all columns are read 
together**.
+
+Fluss 0.9 introduces support for a **Compacted (row-oriented) LogFormat** to 
address these cases. This format is designed for tables such as **aggregated 
result tables** and **large vector or embedding tables**, where **full-row 
reads** are the dominant access pattern. In these scenarios, columnar storage 
provides limited benefit and can introduce unnecessary overhead.
+
+The **Compacted LogFormat** stores rows in a tightly packed, compact 
representation on disk, resulting in:
+* **Reduced disk footprint** for wide rows 
+* **More efficient full-table and wide-row scans** 
+* **Better storage efficiency** for derived and materialized tables
+
+Arrow remains the default and preferred choice for column-pruned analytical 
workloads, while the **Compacted LogFormat** provides a more efficient option 
for full-row, compacted datasets.
+
+You can find more information 
[here](https://fluss.apache.org/docs/0.9/table-design/data-formats/).
+
+### Auto-Increment Columns for Dictionary Tables
+This release introduces `AUTO_INCREMENT` columns in Fluss, enabling 
**Dictionary Tables**, a simple pattern for mapping long identifiers (such as 
strings or UUIDs) to **compact numeric IDs** in real-time systems.
+
+`AUTO_INCREMENT` columns automatically assign a **unique numeric ID** when a 
row is inserted and no value is provided. The assigned ID is **stable** and 
never changes. In distributed setups, IDs may not appear strictly sequential 
due to parallelism and bucketing, but they are guaranteed to be **unique and 
monotonically increasing** per allocation range.
+
+A **Dictionary Table** is a regular Fluss table that uses an `AUTO_INCREMENT` 
column to map long business identifiers to compact integer IDs. 
+In simple terms, it gives every unique value a short number and always returns 
the same number for the same value.
+
+```sql
+CREATE TABLE uid_mapping (
+    uid STRING,
+    uid_int64 BIGINT,
+    PRIMARY KEY (`uid`) NOT ENFORCED
+) WITH (
+      'table.auto-increment.fields' = 'uid_int64'
+);
+
+INSERT INTO uid_mapping VALUES ('user1');
+INSERT INTO uid_mapping VALUES ('user2');
+INSERT INTO uid_mapping VALUES ('user3');
+INSERT INTO uid_mapping VALUES ('user4');
+INSERT INTO uid_mapping VALUES ('user5');
+
+SELECT * FROM uid_mapping;
+
+| uid   | uid_int64 |
+|-------|-----------|
+| user1 |     1     |
+| user2 |     2     |
+| user3 |     3     |
+| user4 |     4     |
+| user5 |     5     |
+```
+
+Dictionary Tables are commonly used to answer operational questions such as:
+* **Unique Counting**: How many unique users, devices, or sessions have we 
seen so far? 
+* **First-Seen Detection**: Is this the first time we are seeing this 
identifier? 
+* **Event Deduplication**: Have we already processed this event?
+* **Active Entity Tracking**: What is the current set of active 
entities/sessions?
+
+They also help keep **identity-related state** manageable over long periods 
and provide a **shared, consistent ID mapping** that can be reused across 
systems.
+
+### Virtual Tables (Changelog, CDC, Audit)
+Apache Fluss 0.9 introduces **Virtual Tables**, providing access to 
**metadata** and **change data** without storing additional data. By simply 
appending `$changelog` to any table name, users can access a complete **audit 
trail** of every data modification.
+
+#### Access the changelog of any table
+```sql
+SELECT * FROM `orders$changelog`;
+```
+![Virtual Table](../assets/0.9/vt.png)
+
+Each changelog record includes metadata columns prepended to the original 
table columns:
+- **`_change_type`**: The type of change operation (`+I`, `-U`/`+U`, `-D`, or 
`+A`).

Review Comment:
   Please refer to the new change_type
   https://fluss.apache.org/docs/0.9/table-design/virtual-tables/#change-types



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [blog] 0.9 Release announcement [fluss]

Reply via email to