[GitHub] [hudi] 1032851561 commented on issue #8087: [SUPPORT] split_reader don't checkpoint before consuming all splits

via GitHub Wed, 08 Mar 2023 00:43:47 -0800


1032851561 commented on issue #8087:
URL: https://github.com/apache/hudi/issues/8087#issuecomment-1459753355


   > > The first checkpoint of the reader is successful after all splits are 
consumed. It takes a lot of time to cause checkpoint timeout.
   > 
   > So which checkpoint is timed out? The `StreamReadOperator` would cache all 
the input splits when they are not consumed, if the reader starts reading from 
the earliest, the checkpoint data set can be large, how much is the ckp data 
size for the successfull ckp of the `StreamReadOperator` ?
   
   The first checkpoint is timed out , because of  `StreamReadOperator` don't 
trigger until the all cache  splits is consumed . The number of splits is about 
1200 , so the ckp data size should be small. 
   
   Now I control the upper limit of the number of data read per checkpoint 
interval, then it looks gook.
   ```
   private void enqueueProcessSplits() {
       if (maxConsumeRecordsPerCkp > 0 && consumedRecordsBetweenCkp > 
maxConsumeRecordsPerCkp)
               return;  //reach max consume records in this checkpoint interval
   }
   
   private void consumeAsMiniBatch(MergeOnReadInputSplit split) throws 
IOException {
            consumedRecordsBetweenCkp += 1L;
            split.consume();
   }
   
   public void snapshotState(StateSnapshotContext context) throws Exception {
           consumedRecordsBetweenCkp = 0L;  // reset when a new checkpoint 
coming.
   }
   ```
   
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] 1032851561 commented on issue #8087: [SUPPORT] split_reader don't checkpoint before consuming all splits

Reply via email to