[
https://issues.apache.org/jira/browse/FLINK-24228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zichen Liu updated FLINK-24228:
-------------------------------
Description:
h2. Motivation
*User stories:*
As a Flink user, I’d like to use Kinesis Firehose as sink for my data pipeline.
*Scope:*
* Implement an asynchronous sink for Kinesis Firehose by inheriting the
AsyncSinkBase class. The implementation can for now reside in its own module in
flink-connectors. The module and package name can be anything reasonable e.g.
{{flink-connector-aws-kinesis}} for the module name and
{{org.apache.flink.connector.aws.kinesis}} for the package name.
* The implementation must use [the Kinesis Java
Client|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/AmazonKinesisClient.html].
* The implementation must allow users to configure the Kinesis Client, with
reasonable default settings.
* Implement an asynchornous sink writer for Firehose by extending the
AsyncSinkWriter. The implementation must deal with failed requests and retry
them using the {{requeueFailedRequestEntry}} method. If possible, the
implementation should batch multiple requests (PutRecordsRequestEntry objects)
to Firehose for increased throughput. The implemented Sink Writer will be used
by the Sink class that will be created as part of this story.
* Unit/Integration testing. Use Kinesalite (in-memory Kinesis simulation). We
already use this in {{{}KinesisTableApiITCase{}}}.
* Java / code-level docs.
* End to end testing: add tests that hits a real AWS instance. (How to best
donate resources to the Flink project to allow this to happen?)
h2. References
More details to be found
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]
was:
h2. Motivation
*User stories:*
As a Flink user, I’d like to use Kinesis Firehose as sink for my data pipeline.
*Scope:*
* Implement an asynchronous sink for Kinesis Firehose by inheriting the
AsyncSinkBase class. The implementation can for now reside in its own module in
flink-connectors. The module and package name can be anything reasonable e.g.
{{flink-connector-aws-kinesis}} for the module name and
{{org.apache.flink.connector.aws.kinesis}} for the package name.
* The implementation must use [the Kinesis Java
Client|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/AmazonKinesisClient.html].
* The implementation must allow users to configure the Kinesis Client, with
reasonable default settings.
* Implement an asynchornous sink writer for Firehose by extending the
AsyncSinkWriter. The implementation must deal with failed requests and retry
them using the {{requeueFailedRequestEntry}} method. If possible, the
implementation should batch multiple requests (PutRecordsRequestEntry objects)
to Firehose for increased throughput. The implemented Sink Writer will be used
by the Sink class that will be created as part of this story.
* Unit/Integration testing. Use Kinesalite (in-memory Kinesis simulation). We
already use this in {{{}KinesisTableApiITCase{}}}.
* Java / code-level docs.
* End to end testing: add tests that hits a real AWS instance. (How to best
donate resources to the Flink project to allow this to happen?)
h2. References
More details to be found
[https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]
*Update:*
^^ Status Update ^^
_{_}List of all work outstanding for 1.15 release{_}_
[Merged] [https://github.com/apache/flink/pull/18165] - KDS DataStream Docs
[Merged] [https://github.com/apache/flink/pull/18396] - [hotfix] for infinte
loop if not flushing during commit
[Merged] [https://github.com/apache/flink/pull/18421] - Mark Kinesis Producer
as deprecated (Prod: FLINK-24227)
[Merged] [https://github.com/apache/flink/pull/18348] - KDS Table API Sink &
Docs
[Merged] [https://github.com/apache/flink/pull/18488] - base sink retry entries
in order not in reverse
[Merged] [https://github.com/apache/flink/pull/18512] - changing failed
requests handler to accept List in AsyncSinkWriter
[Merged] [https://github.com/apache/flink/pull/18483] - Do not expose the
element converter
[Merged] [https://github.com/apache/flink/pull/18468] - Adding Kinesis data
streams sql uber-jar
Ready for review:
[SUCCESS ] [https://github.com/apache/flink/pull/18314] - KDF DataStream Sink &
Docs
[BLOCKED on ^^ ] [https://github.com/apache/flink/pull/18426] - rename
flink-connector-aws-kinesis-data-* to flink-connector-aws-kinesis-* (module
names) and KinesisData*Sink to Kinesis*Sink (class names)
Pending PR:
* Firehose Table API Sink & Docs
* KDF Table API SQL jar
TBD:
* FLINK-25846: Not shutting down
* FLINK-25848: Validation during start up
* FLINK-25792: flush() bug
* FLINK-25793: throughput exceeded
* Update the defaults of KDS sink and update the docs + do the same for KDF
* add a `AsyncSinkCommonConfig` class (to wrap the 6 params) to the
`flink-connector-base` and propagate it to the two connectors
--
---
----
-----
------
-------
--------
---------
----------
-----------
------------
------------- feature freeze
* KDS performance testing
* KDF performance testing
* Clone the new docs to .../contents.zh/... and add the location to the
corresponding Chinese translation jira - KDS -
* Rename [Amazon AWS Kinesis Streams] to [Amazon Kinesis Data Streams] in docs
(legacy issue)
--
---
----
-----
------
-------
--------
---------
----------
-----------
------------
------------- Flink 1.15 release
* KDS end to end sanity test - hits aws apis rather than local docker images
* KDS Python wrappers
* FLINK-25733 - Create A migration guide for Kinesis Table API connector - can
happen after 1.15
* If `endpoint` is provided, `region` should not be required like it currently
is
* Test if Localstack container requires the 10000ms timeout
* Adaptive level of logging (in discussion)
FYI:
* FLINK-25661 - Add Custom Fatal Exception handler in AsyncSinkWriter -
[https://github.com/apache/flink/pull/18449]
* https://issues.apache.org/jira/browse/FLINK-24229 DDB Sink
Chinese translation:
https://issues.apache.org/jira/browse/FLINK-25735 - KDS DataStream Sink
https://issues.apache.org/jira/browse/FLINK-25736 - KDS Table API Sink
https://issues.apache.org/jira/browse/FLINK-25737 - KDF DataStream Sink
> [FLIP-171] Firehose implementation of Async Sink
> ------------------------------------------------
>
> Key: FLINK-24228
> URL: https://issues.apache.org/jira/browse/FLINK-24228
> Project: Flink
> Issue Type: New Feature
> Components: Connectors / Kinesis
> Reporter: Zichen Liu
> Assignee: Zichen Liu
> Priority: Major
> Labels: pull-request-available, stale-assigned
> Fix For: 1.15.0
>
>
> h2. Motivation
> *User stories:*
> As a Flink user, I’d like to use Kinesis Firehose as sink for my data
> pipeline.
> *Scope:*
> * Implement an asynchronous sink for Kinesis Firehose by inheriting the
> AsyncSinkBase class. The implementation can for now reside in its own module
> in flink-connectors. The module and package name can be anything reasonable
> e.g. {{flink-connector-aws-kinesis}} for the module name and
> {{org.apache.flink.connector.aws.kinesis}} for the package name.
> * The implementation must use [the Kinesis Java
> Client|https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/kinesis/AmazonKinesisClient.html].
> * The implementation must allow users to configure the Kinesis Client, with
> reasonable default settings.
> * Implement an asynchornous sink writer for Firehose by extending the
> AsyncSinkWriter. The implementation must deal with failed requests and retry
> them using the {{requeueFailedRequestEntry}} method. If possible, the
> implementation should batch multiple requests (PutRecordsRequestEntry
> objects) to Firehose for increased throughput. The implemented Sink Writer
> will be used by the Sink class that will be created as part of this story.
> * Unit/Integration testing. Use Kinesalite (in-memory Kinesis simulation).
> We already use this in {{{}KinesisTableApiITCase{}}}.
> * Java / code-level docs.
> * End to end testing: add tests that hits a real AWS instance. (How to best
> donate resources to the Flink project to allow this to happen?)
> h2. References
> More details to be found
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)