yzeng1618 commented on issue #10574:
URL: https://github.com/apache/seatunnel/issues/10574#issuecomment-4020704191
sequenceDiagram
participant Broker as Kafka Broker
participant SplitReader as KafkaPartitionSplitReader
participant Emitter as KafkaRecordEmitter
participant Reader as KafkaSourceReader
participant SrcFlow as SourceFlowLifeCycle
participant SinkFlow as SinkFlowLifeCycle
participant JdbcWriter as JdbcSinkWriter
participant Ckpt as CheckpointCoordinator
participant Enum as KafkaSourceSplitEnumerator
%% 正常消费链路
Broker->>SplitReader: poll()
SplitReader->>Emitter: emitRecord(record)
Emitter->>Emitter: deserialize & update currentOffset
%% 快照状态
SrcFlow->>Reader: snapshotState(checkpointId)
Reader->>SrcFlow: return KafkaSourceSplit(startOffset=currentOffset)
SrcFlow->>Ckpt: save state & ack
%% Sink 异常
SinkFlow->>JdbcWriter: prepareCommit()
JdbcWriter-->>SinkFlow: flush error / BatchUpdateException
Note over JdbcWriter,SinkFlow: 非XA JDBC无恢复状态,sink失败
%% 恢复流程(问题点)
Ckpt-->>SrcFlow: restore latest completed checkpoint
SrcFlow->>Enum: RestoredSplitOperation(addSplitsBack)
%% 核心BUG
Enum->>Enum: force use GROUP_OFFSETS
Enum->>Enum: compute wrong start offset(endOffset+1/group offset)
Enum->>SplitReader: assign restored split
%% 最终错误
SplitReader->>Broker: seek(wrong offset)
Note over SplitReader,Broker: 未使用CK偏移量,导致重复/丢失
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]