[
https://issues.apache.org/jira/browse/FLINK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16620511#comment-16620511
]
陈梓立 commented on FLINK-10320:
-----------------------------
We have some cases when our customers set parallelism to over 10,000 because of
the amount of data to process. In such case, {{JobMaster}} would be even
unavailable because busy to handle rpc requests or gc. This is the original
motivation to introduce a benchmark aimed at prevent regression on schedule
module.
> Introduce JobMaster schedule micro-benchmark
> --------------------------------------------
>
> Key: FLINK-10320
> URL: https://issues.apache.org/jira/browse/FLINK-10320
> Project: Flink
> Issue Type: Improvement
> Components: Tests
> Reporter: 陈梓立
> Assignee: 陈梓立
> Priority: Major
>
> Based on {{org.apache.flink.streaming.runtime.io.benchmark}} stuff and the
> repo [flink-benchmark|https://github.com/dataArtisans/flink-benchmarks], I
> proposal to introduce another micro-benchmark which focuses on {{JobMaster}}
> schedule performance
> h3. Target
> Benchmark how long from {{JobMaster}} startup(receive the {{JobGraph}} and
> init) to all tasks RUNNING. Technically we use bounded stream and TM finishes
> tasks as soon as they arrived. So the real interval we measure is to all
> tasks FINISHED.
> h3. Case
> 1. JobGraph that cover EAGER + PIPELINED edges
> 2. JobGraph that cover LAZY_FROM_SOURCES + PIPELINED edges
> 3. JobGraph that cover LAZY_FROM_SOURCES + BLOCKING edges
> ps: maybe benchmark if the source is get from {{InputSplit}}?
> h3. Implement
> Based on the flink-benchmark repo, we finally run benchmark using jmh. So the
> whole test suit is separated into two repos. The testing environment could be
> located in the main repo, maybe under
> flink-runtime/src/test/java/org/apache/flink/runtime/jobmaster/benchmark.
> To measure the performance of {{JobMaster}} scheduling, we need to simulate
> an environment that:
> 1. has a real {{JobMaster}}
> 2. has a mock/testing {{ResourceManager}} that having infinite resource and
> react immediately.
> 3. has a(many?) mock/testing {{TaskExecutor}} that deploy and finish tasks
> immediately.
> [[email protected]] [~GJL] [~pnowojski] could you please review this
> proposal to help clarify the goal and concrete details? Thanks in advance.
> Any suggestions are welcome.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)