platinumhamburg commented on PR #1946:
URL: https://github.com/apache/fluss/pull/1946#issuecomment-3509111813

   @polyzos In my current consideration, I believe the consistency issue should 
be addressed separately in two domains:
   
   - For Streaming Computation Domain: I think we should expose WriterState 
checkpoint/restore interfaces to the upper-layer compute engines. Engines like 
Flink can leverage these interfaces to implement state persistence, allowing 
Fluss to utilize the existing idempotence manager to ensure idempotent data 
writes.
   
   - For Batch Processing Domain: The problem becomes significantly more 
complex, as each batch processing task execution will create independent 
Writers, and the idempotence manager cannot guarantee computational 
consistency. Imagine a batch processing task that fails over midway through 
writing data - the aggregation state of all related primary keys will be 
completely corrupted. To ensure consistency in the batch processing domain, we 
will need:
   
        - Explicit BeginTransaction (PrepareTransaction is optional) / 
CommitTransaction / AbortTransaction interfaces exposed to compute engines
   
        - True transactional visibility control for single Buckets (not through 
transactional buffers like PrewriteBuffer, but complete transaction control 
based on MvccValueEncoder/MvccValueDecoder + TransactionStateManager)
   
        - Compute engines coordinating calls to Bucket local transaction 
managers through a global transaction manager to complete transactional writes 
in an orderly manner.
   
   In summary, for streaming computation consistency, we should be able to 
achieve the required consistency guarantees with relatively minor modifications 
based on our current infrastructure. However, for batch processing, we will 
need to introduce a major architectural extension.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to