curcur edited a comment on pull request #16606:
URL: https://github.com/apache/flink/pull/16606#issuecomment-917898103
Roman and I had several long discussions on interfaces between
Materialization and `ChangelogKeyedStatebackend`. Document here for future
reference.
The main difference is between who is responsible to **keep and update** the
ChangelogKeyedStatebackend related states, denoted as `ChangelogSnapshotState`,
including three parts:
* - materialized snapshot from the underlying delegated state backend
* - non-materialized part in the current changelog
* - non-materialized changelog, from previous logs (before failover or
rescaling)
We've discussed and tried out three versions:
1. `Materialization` coupled with `ChangelogKeyedStatebackend`,
implemented in commit **fbd1e2d38ae6353506ceac8eb074bd24bdb29b62**
Where `PeriodicMaterializer` is an inner class of
`ChangelogKeyedStatebackend`
- Pros: states are shared, easy to reason about
- Cons: Coupled too closely, not flexible or extendible for
keyedstatebackend or materializer
Not to mention further, this approach is discarded during early
discussion.
2. `ChangelogSnapshotState` are kept in materializer. Materializer is
conceptually taken as a way to connect delegated state backend to changelog.
How to connect: through `ChangelogSnapshotState`, as denoted above.
implemented in commit **3421b81c2502f61112bd131a7336c16e3dd30925**
- Pros:
1. Good isolation and extensibility. Clear view the changelog
keyedstatebackend as four parts:
- log writer, delegated statebackend, materializer, and wrapper
changelogkeyedstatebackend for double writing
2. More natural to understand and implement.
- State is updated by the materializer, and accessible by
changelogKeyedStateBackend
- Materializer is part of ChangelogKeyedStateBackend
- Cons:
1. according to Roman, ChangelogKeyedStateBackend has implicit states
(like state double writes) besides the three mentioned above;
2. optimization (like batched writes) need to update materilizer as
well
3. `ChangelogSnapshotState` and its updates are kept in
ChangelogKeyedStatBackend. Materialization works as a stateless Materialization
Manager providing function utilities.
Implemented as commit **75dec43024d91b896d488a4c9e979d486228398a**
- Pros:
1. All states are wrapped in ChangelogKeyedStatBackend
2. Conceptually also works naturally
- Cons:
Circular constructor. `Materialization Manager` needs access to
`ChangelogKeyedStatBackend` to update `ChangelogSnapshotState`
`ChangelogKeyedStatBackend` is created from
StateBackend#createKeyedStateBackend.
To avoid circular construction, `Materialization Manager` has to be
exposed at the time creating ChangelogKeyedStatBackend.
@rkhachatryan what do you think Roman?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]