[
https://issues.apache.org/jira/browse/FLINK-23170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Roman Khachatryan updated FLINK-23170:
--------------------------------------
Fix Version/s: (was: 1.14.0)
1.15.0
> Write metadata after materialization
> ------------------------------------
>
> Key: FLINK-23170
> URL: https://issues.apache.org/jira/browse/FLINK-23170
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / State Backends
> Reporter: Roman Khachatryan
> Priority: Major
> Fix For: 1.15.0
>
>
> Currently, changelog state backend writes state metadata on first state
> access. It is written to the changelog
> On materialization, the changelog can be truncated, so the metadata needs to
> be written again.
>
> Below is a proposed solution using the existing metadtaWritten flag.
> An alternative would be to write metadata at the end of the materialized
> stream.
> Yet another approach is to write metadata to a separate file (however, it
> seems less optimal than at the end of the materialized stream and not so easy
> as writing again).
> There are several questions to answer:
> - *When to mark* the metadata as not written (i.e. reset the metadataWritten
> flag)?
> ** After starting the materialization - so that any subsequent data is
> preceded by metadata
> - *When to request* the write (i.e. call append)
> At any point (mat. start / mat. end / checkpoint start). It doesn't
> matter for correctness - see the next points.
> Scheduling append earlier means:
> -- including metadata in changelog twice unnecesserily (won't hurt
> correctness)
> -- writing for nothing if materialization fails
> Scheduling append later means slowing down the checkpoint
> So at materialization end seem to be a better tradeoff.
> - *What* metadata to write?
> Only for data which were changed after materialization started (so the
> flag is enough)
> - *Where* in changelog to write it to?
> No choice but to the end of the changelog. Because of updating SQN, the
> metadata will appear at the beginning of the state object returned by
> persist(sqn) called after materialization completes.
> - *How to wait for write completion* (before completing checkpoint)?
> Once appended, the future returned from persist() call should include it
> already
>
> So to achieve this it's enough to call appendMetadata() for each changed
> state upon materialization start, or finish, or 1st checkpoint after it.
> —
> Another related change is to skip writing metadata on recovery (only if
> state was read from the changelog).
> This can be achieved by setting the flag when requesting the state from
> ChangeLogApplier.
> *Please create a separate ticket for that if not implementing in this one.*
> —
> Note: with TM-side state ownership, actual log truncation may be delayed
> after materialization (until all the checkpoints using the log are subsumed).
> This should not affect the above logic.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)