fmorillo7694 opened a new pull request, #249:
URL: https://github.com/apache/flink-connector-aws/pull/249
## Summary
- Enable the `kinesis` connector to accept upsert changelog streams (GROUP
BY,
deduplication, streaming joins) by detecting PRIMARY KEY on the table
- DELETE/UPDATE_BEFORE rows are written as empty-payload tombstone records
- Primary key fields are used as the Kinesis partition key for consistent
shard routing
- No new configuration needed — just define `PRIMARY KEY (col) NOT
ENFORCED`
## Changes
- **KinesisDynamicSink**: Added `upsertMode` flag and
`UpsertSerializationSchemaWrapper`
that writes tombstones for deletes and normalizes RowKind for inserts
- **KinesisDynamicTableFactory**: Detects primary key on the table,
enables upsert mode,
and overrides the partitioner to use primary key fields
- **KinesisUpsertSinkSerializationTest**: Unit tests for serialization
behavior across
all RowKind types
## Test plan
- [x] Unit tests for UpsertSerializationSchemaWrapper (INSERT,
UPDATE_AFTER, DELETE, UPDATE_BEFORE)
- [x] Unit tests for RowKind preservation after serialization
- [x] Existing KinesisDynamicTableSinkFactoryTest still passes (backward
compatible)
- [x] Checkstyle, Spotless, ArchUnit all pass
- [x] Verified end-to-end against real KDS: GROUP BY query successfully
wrote
aggregated records with correct primary-key-based partition keys
<!--
*Thank you for contributing to Apache Flink AWS Connectors - we are happy
that you want to help us improve our Flink connectors. To help the community
review your contribution in the best possible way, please go through the
checklist below, which will get the contribution into a shape in which it can
be best reviewed.*
## Contribution Checklist
- The name of the pull request should correspond to a [JIRA
issue](https://issues.apache.org/jira/projects/FLINK/issues). Exceptions are
made for typos in JavaDoc or documentation files, which need no JIRA issue.
- Commits should be in the form of "[FLINK-XXXX][component] Title of the
pull request", where [FLINK-XXXX] should be replaced by the actual issue
number.
Generally, [component] should be the connector you are working on.
For example: "[FLINK-XXXX][Connectors/Kinesis] XXXX" if you are working
on the Kinesis connector or "[FLINK-XXXX][Connectors/AWS] XXXX" if you are
working on components shared among all the connectors.
- Each pull request should only have one JIRA issue.
- Once all items of the checklist are addressed, remove the above text and
this checklist, leaving only the filled out template below.
-->
## Purpose of the change
*For example: Implements the Table API for the Kinesis Source.*
## Verifying this change
Please make sure both new and modified tests in this PR follows the
conventions defined in our code quality guide:
https://flink.apache.org/contributing/code-style-and-quality-common.html#testing
*(Please pick either of the following options)*
This change is a trivial rework / code cleanup without any test coverage.
*(or)*
This change is already covered by existing tests, such as *(please describe
tests)*.
*(or)*
This change added tests and can be verified as follows:
*(example:)*
- *Added integration tests for end-to-end deployment*
- *Added unit tests*
- *Manually verified by running the Kinesis connector on a local Flink
cluster.*
## Significant changes
*(Please check any boxes [x] if the answer is "yes". You can first publish
the PR and check them afterwards, for convenience.)*
- [ ] Dependencies have been added or upgraded
- [ ] Public API has been changed (Public API is any class annotated with
`@Public(Evolving)`)
- [ ] Serializers have been changed
- [ ] New feature has been introduced
- If yes, how is this documented? (not applicable / docs / JavaDocs / not
documented)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]