ChiehFu commented on issue #10914: URL: https://github.com/apache/hudi/issues/10914#issuecomment-2062539177
@danny0405 I got some follow-up questions. Say I run the following steps to set up my data pipeline 1. Run a batch job 1 to bulk_insert historical data into a Hudi table 2. Run a flink stream job 2 with index bootstrap enabled and terminate the job after a checkpoint succeeded 3. Run a flink stream job 3 with index bootstrap disabled restoring from the checkpoint job 2 created My questions are - Would the checkpoint of job 3 contains all index information retrieved from the index bootstrap process in job 2? Asking this as I noticed a significant size differences between the checkpoint of job 2 and job 3. (500GB in job 2 vs < 50GB in job 3) - If job 3 fails and I need to start a job 4 using job 3's latest checkpoint, do I need to have index bootstrap enabled? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org