Myasuka commented on code in PR #1770:
URL: https://github.com/apache/incubator-paimon/pull/1770#discussion_r1288014241
##########
paimon-flink/paimon-flink-common/src/test/java/org/apache/paimon/flink/LookupJoinITCase.java:
##########
@@ -517,6 +517,29 @@ public void testRetryLookup() throws Exception {
iterator.close();
}
+ @Test
+ public void testAsyncRetryLookup() throws Exception {
Review Comment:
It seems this unit test does not have difference with existed
`testRetryLookup`, do you think we should test the `async` behavior with
existing sync lookup tests?
##########
docs/content/how-to/lookup-joins.md:
##########
@@ -78,14 +82,42 @@ FOR SYSTEM_TIME AS OF o.proc_time AS c
ON o.customer_id = c.id;
```
-The lookup join operator will maintain a RocksDB cache locally and pull the
latest updates of the table in real time. Lookup join operator will only pull
the necessary data, so your filter conditions are very important for
performance.
+### Retry Lookup
-This feature is only suitable for tables containing at most tens of millions
of records to avoid excessive use of local disks.
-
-{{< hint info >}}
If the records of `Orders` (main table) join missing because the corresponding
data of `customers` (lookup table) is not ready.
You can consider using Flink's [Delayed Retry Strategy For
Lookup](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#3-enable-delayed-retry-strategy-for-lookup).
-{{< /hint >}}
+Only for Flink 1.16+.
+
+```sql
+-- enrich each order with customer information
+SELECT /*+ LOOKUP('table'='c', 'retry-predicate'='lookup_miss',
'retry-strategy'='fixed_delay', 'fixed-delay'='1s', 'max-attempts'='600') */
+o.order_id, o.total, c.country, c.zip
+FROM Orders AS o
+JOIN customers
+FOR SYSTEM_TIME AS OF o.proc_time AS c
+ON o.customer_id = c.id;
+```
+
+### Async Retry Lookup
+
+The problem with synchronous retry is that one record will block subsequent
records, causing the entire job to be blocked.
+You can consider using async to avoid blocking.
Review Comment:
I think the output mode of `allow_unordered` should be also mentioned in the
docs. Unordered output might produce unexpected results, however, ordered
output could impact the performance significantly. We cannot say async retry
could resolve the poor performance without unordered output.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]