dannycranmer opened a new pull request #13102:
URL: https://github.com/apache/flink/pull/13102


   ## What is the purpose of the change
   
   Adding `FanOutRecordPublisher`, a new implementation of `RecordPublisher` to 
add support for Enhanced Fan Out (EFO) from Kinesis Streams using the 
`FlinkKinesisConsumer`.
   
   ## Brief change log
   
   - `FanOutRecordPublisher`
     - Consumes records from Kinesis using EFO via `KinesisV2Proxy`
     - `FanOutShardSubscriber` used to split consumption and consumer 
processing and error handling
   - `ShardConsumer`
     - Starting position from `LATEST` is converted to `AT_TIMESTAMP` `now()` 
to prevent data loss if error occurs on first read
   - `RecordPublisherFactory` 
     - Added a `close()` method to allow `KinesisAsynClient` to be closed on 
shutdown
   - `FullJitterBackoff`
     - Pulled the jitter backoff implementation into a separate class and 
referenced throughout
   -  `KinesisProxyV2Interface` / `KinesisProxyV2`
     - Added `SubscribeToShard` thin proxy layer
   - `AwsV2Util`
     - Updated client configurations for EFO:
       - `maxConcurrency`: Added a new configuration to allow EFO concurrency 
to be defined separately from other services (list shards etc)
       - `protocol`: Setting to `HTTP2`
       - 
`connectionAcquisitionTimeout`/`healthCheckPingPeriod`/`initialWindowSize` set 
to handle startup with large shard count
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   - `FlinkKinesisConsumerTest`
   - `KinesisDataFetcherTest`
   - `ShardConsumerFanOutTest`
   - `ShardConsumerTest`
   - `FanOutRecordPublisherConfigurationTest`
   - `FanOutRecordPublisherTest`
   - `KinesisProxyV2Test`
   - `AwsV2UtilTest`
   - `KinesisConfigUtilTest`
   
   Fake behaviours for EFO added:
   - `FakeKinesisFanOutBehavioursFactory`
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): yes (added 
EFO consumption)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? There is a [follow up 
task](https://issues.apache.org/jira/browse/FLINK-18870) for documentation 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to