Hi all,

Change Data Capture (CDC) is essential for real-time data processing,
widely used in scenarios like incremental ETL, data synchronization, and
audit logging. However, Fluss currently lacks a standardized mechanism to
expose changelog and binlog data through Flink SQL, forcing users to
implement custom change tracking solutions.

Without native support, users face several challenges:

   - No SQL interface to query operation types (INSERT, UPDATE, DELETE)
   - No efficient access to change data for incremental processing
   - No consistent audit trails across applications
   - No visibility into before/after states for UPDATE operations

To address this, I'd like to propose FIP-20: Introduce $changelog and
$binlog Virtual Tables [1].

[1]
https://cwiki.apache.org/confluence/display/FLUSS/FIP-20+Introduce+%24changelog+and+%24binlog+Virtual+Tables+in+Flink+Engine

This proposal introduces:

   - *$changelog virtual table*: Flat schema with metadata columns (
   _change_type, _log_offset, _commit_timestamp) exposing change operations
   (+I, -U, +U, -D for PK tables; +A for Log tables)
   - *$binlog virtual table*: Nested schema with before/after ROW columns
   providing Debezium-style CDC format with event-level semantics (I, U, D)
   - *Schema introspection*: Support for DESCRIBE and SHOW CREATE TABLE on
   virtual tables
   - *Broad table support*: $changelog works for both Primary Key tables
   and Log tables; $binlog for Primary Key tables only

This follows the existing $lake virtual table pattern, ensuring consistency
across Fluss's virtual table design.

Any feedback and suggestions are welcome!

Best regards,
Mehul

Reply via email to