xuyangzhong commented on code in PR #27191:
URL: https://github.com/apache/flink/pull/27191#discussion_r2509251066
##########
docs/content/docs/dev/table/tuning.md:
##########
@@ -368,3 +368,52 @@ FROM TenantKafka t
LEFT JOIN InventoryKafka i ON t.tenant_id = i.tenant_id AND ...;
```
+## Delta Joins
+
+In stream jobs, regular joins store all historical data from both sides of the
input to ensure the accuracy of the computation results. When an input record
is received, regular joins query the state of the other side to find matching
records to output, while simultaneously updating its own state.
+However, as the job runs for a long term and the input data increases, the
state of regular joins will gradually grow larger. This may lead to
computational resources becoming a bottleneck, impacting the overall
performance of the job and potentially causing a series of stability issues.
+
+To address this, we have introduced a new delta join operator. The core idea
is to leverage a bidirectional lookup join approach to reuse data from source
tables, replacing the state required by regular joins. Compared to traditional
regular joins, delta joins significantly reduce the state size, improve the
stability of the job, and also decrease the demand for computational resources.
+
+This feature is currently enabled by default and regular join will be
optimized into delta join when the following conditions are simultaneously met:
+
+1. The sql pattern satisfies the optimization criteria. For details, please
refer to [Supported Features and Limitations]({{< ref "docs/dev/table/tuning"
>}}#supported-features-and-limitations)
+2. The external storage system of the source table provides index information
for fast querying for delta joins. Currently, [Apache
Fluss(Incubating)](https://fluss.apache.org/blog/fluss-open-source/) has
provided index information at the table level for Flink, allowing such tables
to be used as source tables for delta joins. Please refer to the [Fluss
documentation](https://fluss.apache.org/docs/0.8/engine-flink/delta-joins/#flink-version-support)
for more details.
+
+### Working Principle
+
+In Flink, regular joins require completely storing incoming data from both
sides in the state, matching that data when the opposite side's data arrives.
In contrast, delta joins utilize the indexing capabilities provided by external
storage systems to convert the behavior of querying state data into efficient
queries against data in the external storage system using index keys. This
approach avoids the need for duplicate storage of the same data in both the
external storage system and the state.
Review Comment:
I have updated
```
In Flink, regular joins store all incoming records from both input sides in
the state to ensure that future records can be matched correctly when
corresponding data arrives from the opposite side.
```
to
```
In Flink, regular joins store all incoming records from both input sides in
the state to ensure that corresponding records can be matched correctly when
data arrives from the opposite side.
```
to remove the confusing `future records`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]