[
https://issues.apache.org/jira/browse/FLINK-7508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bowen Li updated FLINK-7508:
----------------------------
Description:
KinesisProducerLibrary (KPL) 0.10.x had been using a One-New-Thread-Per-Request
model for all requests sent to AWS Kinesis, which is very expensive.
0.12.4 introduced a new [ThreadingMode -
Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L225],
which will use a thread pool. This hugely improves KPL's performance and
reduces consumed resources. By default, KPL still uses per-request mode. We
should explicitly switch FlinkKinesisProducer's KPL threading mode to 'Pooled'.
This work depends on FLINK-7366 and FLINK-7508
Benchmarking I did:
* Environment: Running a Flink hourly-sliding windowing job on 18-node EMR
cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job
generates about 21million UserRecords, which means that we generated a test
load of 21million UserRecords at the first minute of each hour.
* Criteria: Test KPL throughput per minute. Since the default RecordTTL for KPL
is 30 sec, we can be sure that either all UserRecords are sent by KPL within a
minute, or we will see UserRecord expiration errors.
* One-New-Thread-Per-Request model: max throughput is about 2million
UserRecords per min; it doesn't go beyond that because CPU utilization goes to
100%, everything stopped working and that Flink job crashed.
* Thread-Pool model: it sends out 21million UserRecords within 30 sec without
any UserRecord expiration errors. The average peak CPU utilization is about 20%
- 30%. So 21million UserRecords/min is not the max throughput of thread-pool
model. We didn't go any further because 1) this throughput is already a couple
times more than what we really need, and 2) we don't have a quick way of
increasing the test load
Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode. [~tzulitai]
What do you think
was:
KinesisProducerLibrary (KPL) 0.10.x had been using a One-New-Thread-Per-Request
model for all requests sent to AWS Kinesis, which is very expensive.
0.12.4 introduced a new [ThreadingMode -
Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L225],
which will use a thread pool. This hugely improves KPL's performance and
reduces consumed resources. By default, KPL still uses per-request mode. We
should explicitly switch FlinkKinesisProducer's KPL threading mode to 'Pooled'.
This work depends on FLINK-7366 and FLINK-7508
Benchmarking I did:
* Environment: Running a Flink hourly-sliding windowing job on 18-node EMR
cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink job
generates about 21million UserRecords, which means that we generated a test
load of 21million UserRecords at the first minute of each hour.
* Criteria:
* One-New-Thread-Per-Request model: max throughput is about 2million
UserRecords per min; it doesn't go beyond that because CPU utilization goes to
100%, everything stopped working and that Flink job crashed.
* Thread-Pool model: it sends out 21million UserRecords within one minute. The
CPU utilization is about
> switch FlinkKinesisProducer to use KPL's ThreadingMode to ThreadedPool mode
> rather than Per_Request mode
> --------------------------------------------------------------------------------------------------------
>
> Key: FLINK-7508
> URL: https://issues.apache.org/jira/browse/FLINK-7508
> Project: Flink
> Issue Type: Improvement
> Components: Kinesis Connector
> Affects Versions: 1.3.0
> Reporter: Bowen Li
> Assignee: Bowen Li
> Priority: Critical
> Fix For: 1.4.0, 1.3.3
>
>
> KinesisProducerLibrary (KPL) 0.10.x had been using a
> One-New-Thread-Per-Request model for all requests sent to AWS Kinesis, which
> is very expensive.
> 0.12.4 introduced a new [ThreadingMode -
> Pooled|https://github.com/awslabs/amazon-kinesis-producer/blob/master/java/amazon-kinesis-producer/src/main/java/com/amazonaws/services/kinesis/producer/KinesisProducerConfiguration.java#L225],
> which will use a thread pool. This hugely improves KPL's performance and
> reduces consumed resources. By default, KPL still uses per-request mode. We
> should explicitly switch FlinkKinesisProducer's KPL threading mode to
> 'Pooled'.
> This work depends on FLINK-7366 and FLINK-7508
> Benchmarking I did:
> * Environment: Running a Flink hourly-sliding windowing job on 18-node EMR
> cluster with R4.2xlarge instances. Each hourly-sliding window in the Flink
> job generates about 21million UserRecords, which means that we generated a
> test load of 21million UserRecords at the first minute of each hour.
> * Criteria: Test KPL throughput per minute. Since the default RecordTTL for
> KPL is 30 sec, we can be sure that either all UserRecords are sent by KPL
> within a minute, or we will see UserRecord expiration errors.
> * One-New-Thread-Per-Request model: max throughput is about 2million
> UserRecords per min; it doesn't go beyond that because CPU utilization goes
> to 100%, everything stopped working and that Flink job crashed.
> * Thread-Pool model: it sends out 21million UserRecords within 30 sec without
> any UserRecord expiration errors. The average peak CPU utilization is about
> 20% - 30%. So 21million UserRecords/min is not the max throughput of
> thread-pool model. We didn't go any further because 1) this throughput is
> already a couple times more than what we really need, and 2) we don't have a
> quick way of increasing the test load
> Thus, I propose switching FlinkKinesisProducer to Thread-Pool mode.
> [~tzulitai] What do you think
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)