[
https://issues.apache.org/jira/browse/FLINK-7622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16166925#comment-16166925
]
Bowen Li commented on FLINK-7622:
---------------------------------
[~tzulitai] I think FLINK-7508 indirectly solves this issue, since FLINK-7508
greatly increased the throughput of Kinesis producer. Here's an example:
https://imgur.com/a/u198h Here are the metrics of our prod Flink job -
UserRecords Put (aka UserRecords put into KPL queue), and user_records_pending
(aka outstanding UserRecords waiting to be sent to AWS in the queue).
When using per_request threading model, a small number (~0.5million) of
UserRecords can cause huge number (~15k) of records pending because the
throughput is so low. After switching to pooled threading model, you can see
the number of outstanding UserRecords has dropped significantly (~0) even
though the number of UserRecords put into the queue grow to 16X bigger
(~8million at peak). Take a closer look at our user_records_pending metric for
the past two weeks at https://imgur.com/a/2YxIm, the # of outstanding records
is consistently under 150 (impressive, right?).
Thus, I believe propagating back pressure to upstream for FlinkKinesisProducer
is not necessary anymore.
> Respect local KPL queue size in FlinkKinesisProducer when adding records to
> KPL client
> --------------------------------------------------------------------------------------
>
> Key: FLINK-7622
> URL: https://issues.apache.org/jira/browse/FLINK-7622
> Project: Flink
> Issue Type: Improvement
> Components: Kinesis Connector
> Reporter: Tzu-Li (Gordon) Tai
>
> This issue was brought to discussion by [~sthm] offline.
> Currently, records are added to the Kinesis KPL producer client without
> checking the number of outstanding records within the local KPL queue. This
> manner is basically neglecting backpressure when producing to Kinesis through
> KPL, and can therefore exhaust system resources.
> We should respect {{producer.getOutstandingRecordsCount()}} as a measure of
> backpressure, and propagate backpressure upstream by blocking further sink
> invocations when some threshold of outstanding record count is exceeded. The
> recommended threshold [1] seems to be 10,000.
> [1]
> https://aws.amazon.com/blogs/big-data/implementing-efficient-and-reliable-producers-with-the-amazon-kinesis-producer-library/
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)