Zouxxyy opened a new issue, #9161:
URL: https://github.com/apache/hudi/issues/9161

   **_Tips before filing an issue_**
   
   The following is the current configuration of bootstrap's mode selection:
   
   - hoodie.bootstrap.mode.selector: class
   - hoodie.bootstrap.mode.selector.regex
   - hoodie.bootstrap.mode.selector.regex.mode
   
   At present, there are problems with this configuration, which is mainly 
introduced by https://github.com/apache/hudi/pull/6673, When we specify 
`hoodie.bootstrap.mode.selector.regex`, we should actually use 
`BootstrapRegexModeSelector`. And there are a lot of usage errors in the test 
cases too.
   
   The correct behavior is: If `BootstrapRegexModeSelector` is configured, 
`SparkBootstrapCommitActionExecutor` will generate two commits, one for 
FULL_RECORD bootstrap and one for METADATA_ONLY bootstrap.
   
   Back to my current work: I am trying to remove the commit behavior in 
`CommitActionExecutor`, so that commit can only be done through `writeClient`, 
the advantages are as follows:
   
   1. The `hoodie.auto.commit` configuration no longer exists, currently its 
usage is very hack, very difficult to read and maintain
   2. No longer necessary to maintain two commit code in two classes (`write 
client` and `CommitActionExecutor`)
   3. `CommitActionExecutor` is more pure, for example: lockManager is no 
longer needed
   
   However, during implementation, due to the existence of 
`BootstrapRegexModeSelector`, bootsrap need complete two commits in one 
operation, which makes it difficult to remove the commit behavior in 
`SparkBootstrapCommitActionExecutor`. Therefore, due to its own bug, I plan to 
remove it.
   
   There will only be one configuration for bootstrap:  
`hoodie.bootstrap.selector.mode`,  bootstrap can only be done in `FULL_RECORD` 
or `METADATA_ONLY` mode (no mixed)
   
   If guys agrees, I can make this modification, and then continue to complete 
the following work https://issues.apache.org/jira/browse/HUDI-6514
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to