1032851561 commented on issue #8087:
URL: https://github.com/apache/hudi/issues/8087#issuecomment-1459753355
> > The first checkpoint of the reader is successful after all splits are
consumed. It takes a lot of time to cause checkpoint timeout.
>
> So which checkpoint is timed out? The `StreamReadOperator` would cache all
the input splits when they are not consumed, if the reader starts reading from
the earliest, the checkpoint data set can be large, how much is the ckp data
size for the successfull ckp of the `StreamReadOperator` ?
The first checkpoint is timed out , because of `StreamReadOperator` don't
trigger until the all cache splits is consumed . The number of splits is about
1200 , so the ckp data size should be small.
Now I control the upper limit of the number of data read per checkpoint
interval, then it looks gook.
```
private void enqueueProcessSplits() {
if (maxConsumeRecordsPerCkp > 0 && consumedRecordsBetweenCkp >
maxConsumeRecordsPerCkp)
return; //reach max consume records in this checkpoint interval
}
private void consumeAsMiniBatch(MergeOnReadInputSplit split) throws
IOException {
consumedRecordsBetweenCkp += 1L;
split.consume();
}
public void snapshotState(StateSnapshotContext context) throws Exception {
consumedRecordsBetweenCkp = 0L; // reset when a new checkpoint
coming.
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]