ChiehFu commented on issue #10914:
URL: https://github.com/apache/hudi/issues/10914#issuecomment-2062539177

   @danny0405 I got some follow-up questions. 
   Say I run the following steps to set up my data pipeline  
   1. Run a batch job 1 to bulk_insert historical data into a Hudi table
   2. Run a flink stream job 2 with index bootstrap enabled and terminate the 
job after a checkpoint succeeded
   3. Run a flink stream job 3 with index bootstrap disabled restoring from the 
checkpoint job 2 created
   
   My questions are
   - Would the checkpoint of job 3 contains all index information retrieved 
from the index bootstrap process in job 2? Asking this as I noticed a 
significant size differences between the checkpoint of job 2 and job 3. (500GB 
in job 2 vs < 50GB in job 3) 
   - If job 3 fails and I need to start a job 4 using job 3's latest 
checkpoint, do I need to have index bootstrap enabled?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to