xuyangzhong commented on code in PR #27191:
URL: https://github.com/apache/flink/pull/27191#discussion_r2498260098


##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -368,3 +368,55 @@ FROM TenantKafka t
          LEFT JOIN InventoryKafka i ON t.tenant_id = i.tenant_id AND ...;
 ```
 
+## Delta Joins
+
+In streaming jobs, the large state of regular join has been a persistent 
concern for users.  Since streaming jobs are long-running, the state of regular 
joins generally increases in size over time, leading to a series of issues, 
including but not limited to:
+
+1. Excessive consumption of computing or storage resources, which becomes a 
bottleneck for the job.
+2. Slow checkpointing, which affects job stability.
+3. Long recovery time from state during failures or restarts.
+
+Delta join primarily addresses these issues. The core idea of delta join is to 
leverage a bidirectional lookup join approach to reuse data from source tables, 
thereby replacing the state required by regular joins. Compared to traditional 
regular joins, delta joins significantly reduce the state size, ensure job 
stability, decrease computing resource requirements, and accelerate job 
recovery times.
+
+This feature is currently enabled by default and regular join will be 
optimized into delta join when the following conditions are simultaneously met:
+
+1. The sql pattern satisfies the optimization criteria. For details, please 
refer to [Supported Features and Limitations]({{< ref "docs/dev/table/tuning" 
>}}#supported-features-and-limitations)
+2. The external storage system of the source table provides index information 
for fast querying for delta joins. Currently, [Apache 
Fluss(Incubating)](https://fluss.apache.org/blog/fluss-open-source/) has 
provided index information at the table level for Flink, allowing such tables 
to be used as source tables for delta joins. Please refer to the [Fluss 
documentation](https://fluss.apache.org/docs/0.8/engine-flink/delta-joins/#flink-version-support)
 for more details.
+
+### Working Principle
+
+In Flink, regular joins require completely storing incoming data from both 
sides in the state, matching that data when the opposite side's data arrives. 
In contrast, delta joins utilize the indexing capabilities provided by external 
storage systems to convert the behavior of querying state data into efficient 
queries against data in the external storage system using index keys. This 
approach avoids the need for duplicate storage of the same data in both the 
external storage system and the state.
+
+{{< img src="/fig/table-streaming/delta_join.png" width="100%" height="100%" 
>}}

Review Comment:
   You're right.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to