kumudkumartirupati opened a new pull request, #5995:
URL: https://github.com/apache/hudi/pull/5995
## What is the purpose of the pull request
Aims to resolve the following bugs in AWS Glue Meta Sync post schema
evolution.
### Bugs
* CreateTable filters out the partition column but updating the table
doesn't filter the schema which leads to having duplicate columns and
incompatible metadata in Hive.
* Table properties not getting updated when the schema changes because of
the incorrect `nonEmpty` predicate check.
* Update table parameters function overrides the `EXTERNAL` and
`last_commit_time_sync` causing to reload all the partitions.
## Brief change log
* Separates `getColumnsFromSchema` logic into a method that provides common
functionality to get the columns in a consistent manner for both creating and
updating tables (Excluding partition columns).
* Replaces `nonEmpty(tableProperties)` with `isNullOrEmpty(tableProperties)`
to make sure tableProperties are updated during schema updates unless if it is
null or empty.
* `updateTableProperties` now doesn't override table parameters to ensure
that `EXTERNAL` and `last_commit_time_sync` properties are not lost during
schema updates.
## Verify this pull request
- *Manually verified the change by running the sync job for creation /
updation of tables.*
## Committer checklist
- [ x ] Has a corresponding JIRA in PR title & commit
- [ x ] Commit message is descriptive of the change
- [ x ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]