yihua opened a new pull request #4544:
URL: https://github.com/apache/hudi/pull/4544
## What is the purpose of the pull request
This PR makes Kafka Connect Sink for Hudi to write empty commits when there
are no new messages from the Kafka topic. This avoids constant rollbacks if
the Kafka topic has no new message. Regardless of whether there are new
messages or not, the write commit logic, including archival, is always
executed, resolving the problem of no archival of rollbacks when there is no
new message as well.
## Brief change log
- Removes the check of the size of write status list from all participants
in `ConnectTransactionCoordinator`.
- Adds a new test for empty status list.
## Verify this pull request
This change added tests and can be verified as follows:
- Run Kafka Connect Sink for Hudi using Quick Start Guide
- Publish some messages to the Kafka topic: `bash setupKafka.sh -n 100 -b 6`
- Wait for some time so the Sink ingests all messages and writes empty
commits
- Publish more messages to the topic: `bash setupKafka.sh -n 100 -b 6 -o 600
-t`
- Verify the table timeline using hudi-cli:
```
hudi:hudi-test-topic->commits show
╔═══════════════════╤═════════════════════╤═══════════════════╤═════════════════════╤══════════════════════════╤═══════════════════════╤══════════════════════════════╤══════════════╗
║ CommitTime │ Total Bytes Written │ Total Files Added │ Total Files
Updated │ Total Partitions Written │ Total Records Written │ Total Update
Records Written │ Total Errors ║
╠═══════════════════╪═════════════════════╪═══════════════════╪═════════════════════╪══════════════════════════╪═══════════════════════╪══════════════════════════════╪══════════════╣
║ 20220109184255282 │ 76.1 KB │ 0 │ 20
│ 5 │ 300 │ 300
│ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20220109184129070 │ 75.7 KB │ 0 │ 20
│ 5 │ 300 │ 300
│ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20220109183955630 │ 0.0 B │ 0 │ 0
│ 0 │ 0 │ 0
│ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20220109183755160 │ 0.0 B │ 0 │ 0
│ 0 │ 0 │ 0
│ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20220109183554995 │ 0.0 B │ 0 │ 0
│ 0 │ 0 │ 0
│ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20220109183354904 │ 0.0 B │ 0 │ 0
│ 0 │ 0 │ 0
│ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20220109183225656 │ 75.7 KB │ 0 │ 20
│ 5 │ 300 │ 300
│ 0 ║
╟───────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
║ 20220109183055068 │ 71.8 KB │ 0 │ 16
│ 5 │ 300 │ 300
│ 0 ║
╚═══════════════════╧═════════════════════╧═══════════════════╧═════════════════════╧══════════════════════════╧═══════════════════════╧══════════════════════════════╧══════════════╝
```
## Committer checklist
- [ ] Has a corresponding JIRA in PR title & commit
- [ ] Commit message is descriptive of the change
- [ ] CI is green
- [ ] Necessary doc changes done or have another open PR
- [ ] For large changes, please consider breaking it into sub-tasks under
an umbrella JIRA.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]