platinumhamburg commented on PR #1946:
URL: https://github.com/apache/fluss/pull/1946#issuecomment-3509111813
@polyzos In my current consideration, I believe the consistency issue should
be addressed separately in two domains:
- For Streaming Computation Domain: I think we should expose WriterState
checkpoint/restore interfaces to the upper-layer compute engines. Engines like
Flink can leverage these interfaces to implement state persistence, allowing
Fluss to utilize the existing idempotence manager to ensure idempotent data
writes.
- For Batch Processing Domain: The problem becomes significantly more
complex, as each batch processing task execution will create independent
Writers, and the idempotence manager cannot guarantee computational
consistency. Imagine a batch processing task that fails over midway through
writing data - the aggregation state of all related primary keys will be
completely corrupted. To ensure consistency in the batch processing domain, we
will need:
- Explicit BeginTransaction (PrepareTransaction is optional) /
CommitTransaction / AbortTransaction interfaces exposed to compute engines
- True transactional visibility control for single Buckets (not through
transactional buffers like PrewriteBuffer, but complete transaction control
based on MvccValueEncoder/MvccValueDecoder + TransactionStateManager)
- Compute engines coordinating calls to Bucket local transaction
managers through a global transaction manager to complete transactional writes
in an orderly manner.
In summary, for streaming computation consistency, we should be able to
achieve the required consistency guarantees with relatively minor modifications
based on our current infrastructure. However, for batch processing, we will
need to introduce a major architectural extension.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]