[GitHub] [incubator-paimon] Myasuka commented on a diff in pull request #1770: [flink] Flink Lookup supports async mode

via GitHub Tue, 08 Aug 2023 23:37:55 -0700


Myasuka commented on code in PR #1770:
URL: https://github.com/apache/incubator-paimon/pull/1770#discussion_r1288014241



##########
paimon-flink/paimon-flink-common/src/test/java/org/apache/paimon/flink/LookupJoinITCase.java:
##########
@@ -517,6 +517,29 @@ public void testRetryLookup() throws Exception {
         iterator.close();
     }
 
+    @Test
+    public void testAsyncRetryLookup() throws Exception {

Review Comment:
   It seems this unit test does not have difference with existed 
`testRetryLookup`, do you think we should test the `async` behavior with 
existing sync lookup tests?



##########
docs/content/how-to/lookup-joins.md:
##########
@@ -78,14 +82,42 @@ FOR SYSTEM_TIME AS OF o.proc_time AS c
 ON o.customer_id = c.id;
 ```
 
-The lookup join operator will maintain a RocksDB cache locally and pull the 
latest updates of the table in real time. Lookup join operator will only pull 
the necessary data, so your filter conditions are very important for 
performance.
+### Retry Lookup
 
-This feature is only suitable for tables containing at most tens of millions 
of records to avoid excessive use of local disks.
-
-{{< hint info >}}
 If the records of `Orders` (main table) join missing because the corresponding 
data of `customers` (lookup table) is not ready.
 You can consider using Flink's [Delayed Retry Strategy For 
Lookup](https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/sql/queries/hints/#3-enable-delayed-retry-strategy-for-lookup).
-{{< /hint >}}
+Only for Flink 1.16+.
+
+```sql
+-- enrich each order with customer information
+SELECT /*+ LOOKUP('table'='c', 'retry-predicate'='lookup_miss', 
'retry-strategy'='fixed_delay', 'fixed-delay'='1s', 'max-attempts'='600') */
+o.order_id, o.total, c.country, c.zip
+FROM Orders AS o
+JOIN customers
+FOR SYSTEM_TIME AS OF o.proc_time AS c
+ON o.customer_id = c.id;
+```
+
+### Async Retry Lookup
+
+The problem with synchronous retry is that one record will block subsequent 
records, causing the entire job to be blocked.
+You can consider using async to avoid blocking.

Review Comment:
   I think the output mode of `allow_unordered` should be also mentioned in the 
docs. Unordered output might produce unexpected results, however, ordered 
output could impact the performance significantly. We cannot say async retry 
could resolve the poor performance without unordered output.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-paimon] Myasuka commented on a diff in pull request #1770: [flink] Flink Lookup supports async mode

Reply via email to