prashantwason opened a new pull request #1687:
URL: https://github.com/apache/hudi/pull/1687
## What is the purpose of the pull request
This PR creates a cleaner interface for HUDI to integrate different type of
base file formats (ORC, HFILE, etc).
## Brief change log
The type of the base file format is chosen during the initial creation of
HUDI dataset. The name of the base file format is written to the
hoodie.properties file and is available to reset of the modules via
HoodieTableConfig.
*Writer side*: HoodieStorageWriter
*Reader side*: HoodieStorageReader
*InputFormat side*: HoodieInputFormat and HoodieRealtimeInputFormat
To test the interface, I have also implemented support for HFile base file
format.
TODO:
1. HUDI-960 and HUDI-961 are being implemented to abstract the DataBlock in
the Log files. Currently the MOR table log blocks are still being saved as Avro
formatted.
## Verify this pull request
All existing tests should work and pass.
Hoodie Table tests have been parameterized to run with various base file
formats.
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]