nbalajee opened a new pull request #2344:
URL: https://github.com/apache/hudi/pull/2344
## What is the purpose of the pull request
When hudi-test-suite is reading records from the existing parquet files, it
is using the reader schema (original schema used to write the parquet file).
Hudi writer core uses the evolved/latest writer schema when reading the
existing parquet files. (Difference is visible only when testing schema
evolution, with an evolved schema).
To mimic the Hudi writer core behavior, with this change latest writer
schema is used for reading the parquet files.
## Brief change log
- Modified DFSHoodieDatasetInputReader, readParquetOrLogFiles() to use
writer schema.
-
## Verify this pull request
This pull request is already covered by existing tests, such as
testSimpleHoodieDatasetReader().
## Committer checklist
- [ x] Has a corresponding JIRA in PR title & commit
- [x ] Commit message is descriptive of the change
- [x ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]