FangYongs opened a new issue, #3428:
URL: https://github.com/apache/paimon/issues/3428

   ### Search before asking
   
   - [X] I searched in the [issues](https://github.com/apache/paimon/issues) 
and found nothing similar.
   
   
   ### Motivation
   
   When we use Paimon as the source for outer key joins, it is usually 
necessary to lookup the source table. 
   
   For example, there are two tables
   1) Table `A(a, b, c, c1, c2, c3, c4, c5)`, where `a` is the primary key
   2) Table `B(c, d, e, e1, e2, e3, e4, e5)`, where `c` is the primary key
   Now we need to perform `A JOIN B` on `A.c = B.c` to output result `(a, b, c, 
d, e, c1, c2, c3, c4, c5, e1, e2, e3, e4, e5)`. 
   
   In Flink, we can convert the outer key join into a primary key join. We 
first perform `Join` on `A (a, c)` and `B (c)` to obtain the related data of 
`(a, c)`, and then lookup `A` and `B` respectively based on the `a` and `c` of 
the related data, and finally output the resulting data.
   During this process, due to the delay (default 10 seconds) in loading 
incremental data of the Paimon dimension table, it is possible that the related 
data of `(a, c)` fails to lookup the data of `A` and `B` in a timely manner, 
resulting in incorrect output results.
   
   To solve this issue, I'd like to introduce key-value cache in Paimon for 
lookup operator. When data is written to Paimon, it can be written to a 
key-value cache before the snapshot is created. And when the downstream 
operator get data from Paimon, it can always lookup data from key-value cache 
correctly.
   
   ### Solution
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to