Zouxxyy opened a new issue, #9161: URL: https://github.com/apache/hudi/issues/9161
**_Tips before filing an issue_** The following is the current configuration of bootstrap's mode selection: - hoodie.bootstrap.mode.selector: class - hoodie.bootstrap.mode.selector.regex - hoodie.bootstrap.mode.selector.regex.mode At present, there are problems with this configuration, which is mainly introduced by https://github.com/apache/hudi/pull/6673, When we specify `hoodie.bootstrap.mode.selector.regex`, we should actually use `BootstrapRegexModeSelector`. And there are a lot of usage errors in the test cases too. The correct behavior is: If `BootstrapRegexModeSelector` is configured, `SparkBootstrapCommitActionExecutor` will generate two commits, one for FULL_RECORD bootstrap and one for METADATA_ONLY bootstrap. Back to my current work: I am trying to remove the commit behavior in `CommitActionExecutor`, so that commit can only be done through `writeClient`, the advantages are as follows: 1. The `hoodie.auto.commit` configuration no longer exists, currently its usage is very hack, very difficult to read and maintain 2. No longer necessary to maintain two commit code in two classes (`write client` and `CommitActionExecutor`) 3. `CommitActionExecutor` is more pure, for example: lockManager is no longer needed However, during implementation, due to the existence of `BootstrapRegexModeSelector`, bootsrap need complete two commits in one operation, which makes it difficult to remove the commit behavior in `SparkBootstrapCommitActionExecutor`. Therefore, due to its own bug, I plan to remove it. There will only be one configuration for bootstrap: `hoodie.bootstrap.selector.mode`, bootstrap can only be done in `FULL_RECORD` or `METADATA_ONLY` mode (no mixed) If guys agrees, I can make this modification, and then continue to complete the following work https://issues.apache.org/jira/browse/HUDI-6514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
