yihua opened a new pull request, #8176:
URL: https://github.com/apache/hudi/pull/8176
### Change Logs
This PR applies the logic of automatically inferring the key generator to
Deltastreamer, besides Spark Datasource, Streaming and SQL writers.
- Moves the logic of automatic inference of key generator to `KeyGenUtils`
- Refactors `HoodieSparkKeyGeneratorFactory` for code reuse
- Fixes `DeltaSync` for using the inferred key generator
- Removes unnecessary key generator configs in a few tests
- Adds new tests in `TestHoodieDeltaStreamer`, `TestKeyGenUtils`, and
`TestHoodieSparkKeyGeneratorFactory` to verify the automatic inference of the
key generator type and class name.
Note that, if either `hoodie.datasource.write.keygenerator.class` or
`hoodie.datasource.write.keygenerator.type` is explicitly set, the configured
value takes precedence to be in effect
(`hoodie.datasource.write.keygenerator.class` first, and then
`hoodie.datasource.write.keygenerator.type`) and the inference of the key
generator is not triggered.
### Impact
In most of cases (for key generator types of SIMPLE, COMPLEX, and
NON_PARTITION), user does not need to explicitly specify the key generator type
anymore. The key generator is automatically inferred.
### Risk level
low
### Documentation Update
Need to update the documentation around code examples where key generator
configs are not mandatory.
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Change Logs and Impact were stated clearly
- [ ] Adequate tests were added if applicable
- [ ] CI passed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]