[jira] [Commented] (FLINK-24229) [FLIP-171] DynamoDB implementation of Async Sink

Yuri Gusev (Jira) Mon, 29 Nov 2021 07:16:11 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-24229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17450520#comment-17450520
 ]


Yuri Gusev commented on FLINK-24229:
------------------------------------


We are working on it, implementation is almost there, will submit a PR for 
review may be this/start of the next week. In our previous implementation we 
deduplicated entries during aggregation of a batch, but now we need to move 
this to the writer itself, because you are doing batching in the 
AsyncSinkWriter.

I'm not sure I followed your answer on the fatal exception behaviour.

What we would like to do is to allow user to define what to do in the end (for 
example after all retries towards DynamoDB for the current batch). 

This is how we achieved it in the old implementation: 
[WriteRequestFailureHandler.java|https://github.com/YuriGusev/flink/blob/FLINK-16504_dynamodb_connector_rebased/flink-connectors/flink-connector-dynamodb/src/main/java/org/apache/flink/streaming/connectors/dynamodb/WriteRequestFailureHandler.java],
 
[DynamoDbSink.java|https://github.com/YuriGusev/flink/blob/8787c343b615602c989fa793e0f4687ef40e530c/flink-connectors/flink-connector-dynamodb/src/main/java/org/apache/flink/streaming/connectors/dynamodb/DynamoDbSink.java#L355]

At the moment it will send the failed records back to the queue and retry 
again, or on fatalException it will propagate the error and stop the 
application.


> [FLIP-171] DynamoDB implementation of Async Sink
> ------------------------------------------------
>
>                 Key: FLINK-24229
>                 URL: https://issues.apache.org/jira/browse/FLINK-24229
>             Project: Flink
>          Issue Type: New Feature
>          Components: Connectors / Common
>            Reporter: Zichen Liu
>            Assignee: Zichen Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.15.0
>
>
> h2. Motivation
> *User stories:*
>  As a Flink user, I’d like to use DynamoDB as sink for my data pipeline.
> *Scope:*
>  * Implement an asynchronous sink for DynamoDB by inheriting the 
> AsyncSinkBase class. The implementation can for now reside in its own module 
> in flink-connectors.
>  * Implement an asynchornous sink writer for DynamoDB by extending the 
> AsyncSinkWriter. The implementation must deal with failed requests and retry 
> them using the {{requeueFailedRequestEntry}} method. If possible, the 
> implementation should batch multiple requests (PutRecordsRequestEntry 
> objects) to Firehose for increased throughput. The implemented Sink Writer 
> will be used by the Sink class that will be created as part of this story.
>  * Java / code-level docs.
>  * End to end testing: add tests that hits a real AWS instance. (How to best 
> donate resources to the Flink project to allow this to happen?)
> h2. References
> More details to be found 
> [https://cwiki.apache.org/confluence/display/FLINK/FLIP-171%3A+Async+Sink]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-24229) [FLIP-171] DynamoDB implementation of Async Sink

Reply via email to