hlteoh37 commented on code in PR #765:
URL: https://github.com/apache/flink-web/pull/765#discussion_r1856903008


##########
docs/content/posts/2024-11-25-whats-new-aws-connectors-5.0.0.md:
##########
@@ -0,0 +1,148 @@
+---
+title:  "Introducing the new Amazon Kinesis Data Stream and Amazon DynamoDB 
Stream sources"
+date: "2024-11-25T18:00:00.000Z"
+authors:
+- hong:
+  name: "Hong Liang Teoh"
+---
+
+
+We are pleased to introduce updated versions of the Amazon Kinesis Data Stream 
and Amazon DynamoDB Stream sources. Built on the [FLIP-27 source 
interface](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface),
 these newer connectors introduce 7 new features and are compatible with Flink 
2.0.
+
+The new 
[`KinesisStreamsSource`](https://github.com/apache/flink-connector-aws/blob/v5.0/flink-connector-aws/flink-connector-aws-kinesis-streams/src/main/java/org/apache/flink/connector/kinesis/source/KinesisStreamsSource.java)
 replaces the legacy 
[`FlinkKinesisConsumer`](https://github.com/apache/flink-connector-aws/blob/v5.0/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisConsumer.java);
 and the new 
[`DynamoDbStreamsSource`](https://github.com/apache/flink-connector-aws/blob/v5.0/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/source/DynamoDbStreamsSource.java)
 replaces the legacy 
[`FlinkDynamoDBStreamsConsumer`](https://github.com/apache/flink-connector-aws/blob/v5.0/flink-connector-aws/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkDynamoDBStreamsConsumer.java).
 The new connectors are available for Flink 1.19 onwards, and AW
 S Connector version 5.0.0 onwards. For more information, see the section on 
[Dependencies](#dependencies).
+
+In this blogpost, we will dive into the motivation for the new source 
connectors, the improvements introduced, and provide migration guidance for 
users.
+
+## Dependencies
+
+<table>
+  <tr>
+    <th>Connector</th>
+    <th>API</th>
+    <th>Dependency</th>
+    <th>Usage</th>
+  </tr>
+  <tr>
+    <td>Amazon Kinesis Data Streams source</td>
+    <td>DataStream<br>Table API</td>
+    <td> Use the <code>flink-connector-aws-kinesis-streams</code> artifact. 
See <a 
href="https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kinesis/";>
+      Flink Kinesis connector documentation</a> for details.
+    </td>
+    <td>
+      Use the fluent 
+      <a 
href="https://github.com/apache/flink-connector-aws/blob/v5.0/flink-connector-aws/flink-connector-aws-kinesis-streams/src/main/java/org/apache/flink/connector/kinesis/source/KinesisStreamsSourceBuilder.java";>
+      KinesisStreamsSourceBuilder</a> to create the source. See 
+      <a 
href="https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/kinesis/";>
+      Flink Kinesis connector documentation</a> for more details.
+    </td>
+  </tr>
+  <tr>
+    <td>Amazon Kinesis Data Streams source</td>
+    <td>SQL</td>
+    <td> Use the <code>flink-sql-connector-aws-kinesis-streams</code> 
artifact. See <a 
href="https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/kinesis/";>
+      Flink SQL Kinesis connector documentation</a> for details.
+    </td>
+     <td>
+      Use the table identifier <code>kinesis</code>. See the 
+      <a 
href="https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/table/kinesis/";>
+      Flink SQL Kinesis connector documentation</a> for configuration and 
usage details.
+    </td>
+  </tr>
+  <tr>
+    <td>Amazon DynamoDB Streams source</td>
+    <td>DataStream</td>
+    <td> Use the <code>flink-connector-dynamodb</code> artifact. See <a 
href="https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/dynamodb/";>
+      Flink DynamoDB connector documentation</a> for details.
+    </td>
+    <td>
+      Use the fluent 
+      <a 
href="https://github.com/apache/flink-connector-aws/blob/v5.0/flink-connector-aws/flink-connector-dynamodb/src/main/java/org/apache/flink/connector/dynamodb/source/DynamoDbStreamsSourceBuilder.java";>
+      DynamoDbStreamsSourceBuilder</a> to create the source. See 
+      <a 
href="https://nightlies.apache.org/flink/flink-docs-stable/docs/connectors/datastream/dynamodb/";>
+      Flink DynamoDB connector documentation</a> for more details.
+    </td>
+  </tr>
+</table>
+
+
+## Why did we need new source connectors?
+
+We implemented new source connectors because the legacy `FlinkKinesisConsumer` 
and `FlinkDynamoDBStreamsConsumer` use the deprecated `SourceFunction` 
interface, which is removed in Flink 2.x. From Flink 2.x onwards, only 
`KinesisStreamsSource` and `DynamoDbStreamsSource`, which use the new `Source` 
interface will be supported.
+
+## New features
+
+The updated `KinesisStreamsSource` and `DynamoDbStreamsSource` connectors 
offer the following new features:
+
+1. **Native Flink watermark integration.** On the new `Source` interface, 
watermark generation is abstracted away to the Flink framework, and no longer a 
responsibility of the source. This means the new source has support for 
watermark alignment, and idle watermark handling out-of-the-box.
+2. **Standardised Flink Source metrics.** The new `Source` framework also 
introduces [standardised Source 
metrics](https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics).
 This enables users to track record throughput and lag across sources in a 
standardised manner.
+3. **Records are read in-order even after a resharding operation on the 
stream.** The new `Source` ensures that parent shards are read completely 
before reading children shards. This allows record ordering to be maintained 
even after a resharding operation. See [explanation of record ordering in 
Kinesis Data Streams](#appendix-detailed-explanation-of-record-ordering) for 
more information.
+4. **Migrate away from AWS SDK v1 to AWS SDK v2.** This SDK update aligns with 
best practices.
+5. **Migrate away from custom retry strategies to use the AWS SDK native retry 
strategies instead.** This allows us to benefit from AWS error classification 
in the retry algorithm.
+6. **Reduce jar size by >99%, from 
[~60MB](https://mvnrepository.com/artifact/org.apache.flink/flink-connector-kinesis/5.0.0-1.20)
 to 
[~200KB](https://mvnrepository.com/artifact/org.apache.flink/flink-connector-aws-kinesis-streams/5.0.0-1.20).**
 In the new source, we no longer shade the AWS SDK and no longer need to 
package the Kinesis Producer Library (needed for legacy sink). This will lead 
to smaller Flink application jars.
+7. **Improve defaults.** The `UniformShardAssigner` which provides even shard 
distribution across subtasks even when there is a resharding event, by using 
information around the Shard hash key range, is now the default shard assigner.

Review Comment:
   Adjusted



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to