[
https://issues.apache.org/jira/browse/FLINK-25205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454485#comment-17454485
]
Jingsong Lee commented on FLINK-25205:
--------------------------------------
At the next checkpoint, this update_before is no longer needed. To write to the
upsert sink, write only the latest update_after, because there is no disordered
data between the two checkpoints.
> Optimize SinkUpsertMaterializer
> -------------------------------
>
> Key: FLINK-25205
> URL: https://issues.apache.org/jira/browse/FLINK-25205
> Project: Flink
> Issue Type: Improvement
> Components: Table SQL / Runtime
> Reporter: Jingsong Lee
> Priority: Major
>
> SinkUpsertMaterializer maintains incoming records in state corresponding to
> the upsert keys and generates an upsert view for the downstream operator.
> It is intended to solve the messy order problem caused by the upstream
> computation, but it stores the data in the state, which will get bigger and
> bigger.
> If we can think that the disorder only occurs within the checkpoint, we can
> consider cleaning up the state of each checkpoint, which can control the size
> of the state.
> We can consider adding an optimized config option first.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)