[jira] [Comment Edited] (FLINK-25205) Optimize SinkUpsertMaterializer

Lsw_aka_laplace (Jira) Tue, 07 Dec 2021 01:20:10 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454472#comment-17454472
 ]


Lsw_aka_laplace edited comment on FLINK-25205 at 12/7/21, 9:19 AM:
-------------------------------------------------------------------

UPDATE 

 

After discussing with [~lzljs3620320], my question is even not a question in 
this situation. Just ignore me. 

Thx for [~lzljs3620320]  for your patience.

------

Hi [~lzljs3620320],   After looking through `SinkUpsertMaterializer`, I have 
one question about this issue. Is all data from one changelog stream naturally 
split by checkpoints? Assuming that a UPATE_BEFORE row  and a  UPDATE_AFTER row 
are coincidently separated by checkpoint T. The UPATE_BEFORE row belongs to 
checkpoint T but the UPDATE_AFTER row belongs to checkpoint T+1, as far as I am 
concerned.  What shall we do in this situation?

If not, would you mind giving some explanation on this? 

Cheers~


was (Author: neighborhood):
Hi [~lzljs3620320],   After looking through `SinkUpsertMaterializer`, I have 
one question about this issue. Is all data from one changelog stream naturally 
split by checkpoints? Assuming that a UPATE_BEFORE row  and a  UPDATE_AFTER row 
are coincidently separated by checkpoint T. The UPATE_BEFORE row belongs to 
checkpoint T but the UPDATE_AFTER row belongs to checkpoint T+1, as far as I am 
concerned.  What shall we do in this situation?

If not, would you mind giving some explanation on this? 

Cheers~

> Optimize SinkUpsertMaterializer
> -------------------------------
>
>                 Key: FLINK-25205
>                 URL: https://issues.apache.org/jira/browse/FLINK-25205
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Runtime
>            Reporter: Jingsong Lee
>            Priority: Major
>
> SinkUpsertMaterializer maintains incoming records in state corresponding to 
> the upsert keys and generates an upsert view for the downstream operator.
> It is intended to solve the messy order problem caused by the upstream 
> computation, but it stores the data in the state, which will get bigger and 
> bigger.
> If we can think that the disorder only occurs within the checkpoint, we can 
> consider cleaning up the state of each checkpoint, which can control the size 
> of the state.
> We can consider adding an optimized config option first.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Comment Edited] (FLINK-25205) Optimize SinkUpsertMaterializer

Reply via email to