dannycranmer opened a new pull request #13102:
URL: https://github.com/apache/flink/pull/13102
## What is the purpose of the change
Adding `FanOutRecordPublisher`, a new implementation of `RecordPublisher` to
add support for Enhanced Fan Out (EFO) from Kinesis Streams using the
`FlinkKinesisConsumer`.
## Brief change log
- `FanOutRecordPublisher`
- Consumes records from Kinesis using EFO via `KinesisV2Proxy`
- `FanOutShardSubscriber` used to split consumption and consumer
processing and error handling
- `ShardConsumer`
- Starting position from `LATEST` is converted to `AT_TIMESTAMP` `now()`
to prevent data loss if error occurs on first read
- `RecordPublisherFactory`
- Added a `close()` method to allow `KinesisAsynClient` to be closed on
shutdown
- `FullJitterBackoff`
- Pulled the jitter backoff implementation into a separate class and
referenced throughout
- `KinesisProxyV2Interface` / `KinesisProxyV2`
- Added `SubscribeToShard` thin proxy layer
- `AwsV2Util`
- Updated client configurations for EFO:
- `maxConcurrency`: Added a new configuration to allow EFO concurrency
to be defined separately from other services (list shards etc)
- `protocol`: Setting to `HTTP2`
-
`connectionAcquisitionTimeout`/`healthCheckPingPeriod`/`initialWindowSize` set
to handle startup with large shard count
## Verifying this change
This change added tests and can be verified as follows:
- `FlinkKinesisConsumerTest`
- `KinesisDataFetcherTest`
- `ShardConsumerFanOutTest`
- `ShardConsumerTest`
- `FanOutRecordPublisherConfigurationTest`
- `FanOutRecordPublisherTest`
- `KinesisProxyV2Test`
- `AwsV2UtilTest`
- `KinesisConfigUtilTest`
Fake behaviours for EFO added:
- `FakeKinesisFanOutBehavioursFactory`
## Does this pull request potentially affect one of the following parts:
- Dependencies (does it add or upgrade a dependency): no
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: no
- The serializers: no
- The runtime per-record code paths (performance sensitive): yes (added
EFO consumption)
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
- The S3 file system connector: no
## Documentation
- Does this pull request introduce a new feature? yes
- If yes, how is the feature documented? There is a [follow up
task](https://issues.apache.org/jira/browse/FLINK-18870) for documentation
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]