mosche commented on code in PR #23540:
URL: https://github.com/apache/beam/pull/23540#discussion_r1142922485
##########
sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/kinesis/KinesisIO.java:
##########
@@ -147,6 +150,42 @@
* Read#withCustomRateLimitPolicy(RateLimitPolicyFactory)}. This requires
implementing {@link
* RateLimitPolicy} with a corresponding {@link RateLimitPolicyFactory}.
*
+ * <h4>Enhanced Fan-Out</h4>
+ *
+ * Kinesis IO supports Consumers with Dedicated Throughput (Enhanced Fan-Out,
EFO). This type of
+ * consumer doesn't have to contend with other consumers that are receiving
data from the stream.
+ *
+ * <p>More details can be found here: <a
+ *
href="https://docs.aws.amazon.com/streams/latest/dev/enhanced-consumers.html">Consumers
with
+ * Dedicated Throughput</a>
+ *
+ * <p>EFO can be enabled for one or more {@link KinesisIO.Read} instances via
pipeline options:
Review Comment:
@psolomin Keep in mind, there's still other runners that don't share the
behavior of Flink. With Dataflow I'm 99% sure you can change the configuration
when updating / resuming a streaming pipeline
(https://cloud.google.com/dataflow/docs/guides/updating-a-pipeline). Though,
happy to confirm that with some Googlers if you prefer. Honestly, I'd be rather
reluctant to add a non-functional configuration such as
`withEnhancedFanOutEnabled()` that then fully relies on configuration via
pipeline options.
I feel to comply with Beam standards it should be possible to configure this
feature alone using the read spec (even though that is very limited when used
with Flink).
Regarding the inconsistency discussed above (enabled via read spec, but how
to disable then), it wouldn't be hard to use the pipeline options mechanism as
well using a null mapping:
```
--kinesisIOConsumerArns={"stream1":null}
```
Just to get a 3rd opinion on his, @aromanenko-dev any thoughts from your
side?
One use case we discussed is to temporarily enabled EFO consumers to
initially backfill a pipeline with max. throughput, but then disable EFO again
once the pipeline caught up to save on costs.
Context to this discussion is the question how to enable Enhanced Fanout
Consumers for Kinesis to support the above use case as well. Problem is that
the read specification is serialized together with the source in Flink
savepoints. So when resuming the pipeline any changes to the read configuration
won't be noticed - and EFO can't be disabled. The only way to do this (for
Flink) is to configure EFO consumers via pipeline options.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]